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(54) Compressing and decompressing data 

(57) In a tag document compressing/decompress- 
ing technique, a tag document compressing apparatus 
(2), for example, has a tag extracting unit (30) for scan- 
ning document type definition of an inputted tag docu- 
ment to extract a tag. a tag code table creating unit (40) 
for assigning a predetern^ned code to the tag in the 
document type definition on the basis of the tag 
extracted by the tag extracting unit (30) to create a tag 



code table, and a tag coding unit (60) for coding the tag 
in document instance on the basis of the tag code table 
created by the tag code table creating unit (40) so as to 
compress the document in consMeration of the tag in 
the tag document, thereby improving a compression 
rate of the tag document and decreasing a quantity of 
data of the same. 
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Description 

[0001 1 The present invention relates to a technique of compressing and decompressing data, particularly, to an appa- 
ratus, a method and a recording medium suitable for use when a document (a tag document) structured and described 
5 according to control characters (strings) called tags defining a document structure is compressed and decompressed. 
[0002] A recent trend is to unify formats of documents handled by computers, an aim of which is to be able to handle 
formats of documents, which have differed from computer to computer, or from application to application, in different 
conputer environments. 

[0003] As an example, there is an international standard (IS08879) for a document format called SGML (Standard 
10 Generalized Markup Language) established by ISO in 1986. An SGML document consists of, as schematically shown 
in FIG. 31 . three portions, that is, SGML declaration 301. document type definition (DTD: Document Type Definition) 
302 and document instance 303. 

[0004] The SGML declaration 301 is a portion declaring a character set and the like necessary to process an SGML 
document in another system. The DTD 302 is a portion defining a structure of a document such as chapter, paragraph, 
IS title, etc., which is described in a format as shown in FIG. 32, for example. The DTD 302 shown in FIG. 32 is a portion 
of DTD of HTML (Hyper Text Markup Language), which is a kind of SGML spread as a description format on the World 
Wide Web (WWW) of the Internet. 

[0005] The document instance 303 Is a body of the SGML document, which Is made by a writer (user) using an editor 
of the computer while refenring to the DTD 302. Concretely, the document instance 303 is described using controlling 
20 characters (strings) showing elements generally called tags. Each of the tags is defined in the above DTD 302, which 
represents what is an element in a document instance 303 (for example, the element Is a title, a chapter, or the like) 
[0006] FIG. 33 Is a diagram showing an example of description of the document instance 303. In FIG. 33, a character 
string ((TITLE), (/TITLE). (SECTION), (/SECTION), etc.) sandwiched between "(" and ")", or "(T and ")" Is a tag. As 
shown in FIG. 33, a portion described as: 

25 

<TITLE>«W (#|g) WaiS</TITLE> 

represents that characters (strings) sandwiched between (TITLE ) which is a start-tag and (/TITLE ) which is an end-tag 
30 is an element (a name of title) 

[0007] There is now a strong movement to employ SGML with public organizations in the forefront, e.g. the National 
Military Establishment of U.S.A. imposes a requirement to describe a document in SGML when submitting it. In Japan, 
the Patent Office has decided to employ SGML for CD-ROM publications. 

[0008] Meanwhile, various types of data such as character codes, vector information, image information; etc. are han- 
35 died in computers, and a quantity of data is rapidly increasing at present. Therefore, a computer generally eliminates 
redundant portions in data to compress a quantity of the data so as to decrease a storage capacity for the data, or ena- 
ble a high-speed data transmission, when handling a large quantity of data. 

[0009] There are several applications of data compressing techniques. Here are described archiver and compressing 
drive as examples of application of data compression used in conputers. 

40 [0010] The archiver is a manner of compressing one or a plurality of data files, and collecting them into one file. By 
using the archiver on a file rarely used or an okJ file, it is possible to decrease a capacity of the file. When a server sup- 
plies files (data, application or the like) through a personal computer communication or Internet, it is possible to reduce 
communication cost and a labor of transfen'ing by collecting all the files into one using the archiver. 
[001 1 ] On the other hand, the compressing drive is a manner of compressing data on a disk such as a hard disk (HD). 

45 a floppy disk (FD) or the like of a computer as a unit. By designating an arbitrary disk drive, all files in the designated 
drive are compressed and held. In the compressing drive, a compressing/decompressing process is generally per- 
formed in a background of the computer, so that compression/decompression (decompression at the time of reading, 
and compression at the time of writing) is automatically performed in ordinary operations (read/write) by the user. 
Therefore, it looks to the user that a size of the designated disk system is increased since the user is not at all conscious 

50 of compression/decompression of data. 

[001 2] As a coding system used in these examples of application, there is often used universal coding system in which 
an efficiency of compression is not much dependent upon characters of data, since various data such as text, machine 
language, image, voice, etc. are handled in the computer. 

[001 3] The universal coding is classified into LZ-coding which utilizes repeatability of a character and statistical coding 
55 which codes a probability of occurrence of a character. The L2-coding stores a character (string) having occun-ed in the 
past in a buffer, and outputs a start position in the buffer and a coinciding length as coded data when the same character 
(string) occurs. The statistical coding calculates a probability (frequency) of occurrence of a character having occurred 
in the past, and outputs a code according to the probability of occurrence. The LZ-coding can accomplish a high-speed 
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process, whereas the statistical coding can accomplish a high-compression rate. 

[0014] The data compressing techniques are normally used to decrease a data capacity of the corrpiiter or a oom- 
munication cost. As to a document f Oe, it is possible to compress the whole document so as to manage a large volume 
of documents. 

[0015] In the document Instance 303 of the SGML document, a quantity of data of the document is increased since 
tags defining elements in the document are added to the document itself. A study on an SGML document revealed that 
a proportion of tags in the document exceeds forty percent. Not only documents submitted to public agencies but also 
manuals attached to products are increasingly produced in SGML format, recently. Such a manual may have several 
tens to, sometimes, several hundred pages, and frequently reused. If a history of the revision is included, a quantity of 
data of the manual is enormous. 

[001 6] If an SGML document is compressed using the above universal coding or other coding system as well as ordi- 
nary documents or documents in another format, it is possbie to decrease a quantity of the data to some extent. How- 
ever, the above techniques are quite inefficient since a coding system heretofore used is merely applied to the SGML 
document as a whole, without consideration given to tags occiwing a large portion in the document in the compres- 
sion. 

[001 7] Documents including tags such as SGML documents are referred to as "tag documents" below. 
[0018] In the light of the above problems, an embodiment of the present invention may improve a compression rate 
of a tag document and decrease a quantity of data thereof by compressing and decompressing the document in con- 
sideration of tags in the tag document. 

[0019] The present Invention therefore provides a tag document compressing apparatus for coding a tag document 
having a document type definition defining a tag showing a document structure and a document instance described 
using the tag defined in the document type definition to compress ttie tag document comprising a tag extracting unit for 
scanning the document definition of an inputted tag document to extract the tag, a tag code table creating unit for 
assigning a predetennined code to the tag in the document definition on the basis of the tag extracted by the tag extract- 
ing unit to create a tag code table, and a tag coding unit for coding the tag in the document instance on the basis of the 
tag code table created by the tag code table creating unit 

[0020] The present invention also provide a tag document compressing metiiod for coding a tag document having a 
document type definition defining a tag showing a document structure and a document Instance described using the 
tag defined in the document type definition to decompress the tag document comprising the steps of assigning a^pre- 
determlned code to tiie tag in tfie document type definition to create a tag code table, and decoding the tag in the doc- 
ument instance on the basis of the tag code table. 

[0021] According to the tag document compressing apparatus and compressing method of this invention, a predjejer- 
mined code is assigned to a tag in the document type definition to create a tag code table, the tag in the documentitype 
definition is coded on the basis of the tag code table. It is tiierefore possible to compress tags in a tag documenfevery 
efficiently, and largely decrease a quantity of data of tfie tag document. ^- - 

[0022] If a plurality of tag documents having the same document type definition are coded, it is possible to code tags 
in the document type definitions of all of the tag documents on the basis of a tag code table created with respect to the 
first document. 

[0023] Accordingly, it is unnecessary to aeate a tag code table for each tag document so that the tag coding process 
can be performed at a very high speed. 

[0024] The present invention further provides a tag document conpressing apparatus for coding a tag document hav- 
ing a document type definition defining a tag showing a document structure and a document instance described using 
the tag defined in the document type definition to compress the tag document comprising a tag extracting unit for scan- 
ning the document type definition of an inputted tag document to extract ttie tag, a tag code creating unit for assigning 
a predetermined code to tiie tag in ttie document type definition on the basis of ttie tag extracted by the tag extracting 
unit to create a tag code table, a tag discriminating unit for determining whether data in ttie Inputted document instance 
is tfie tag extracted by the tag extracting unit, a coding process unit for coding the inputted data on the basis of the tag 
code table when the tag discriminating unit determines that the inputted data is the tag. whereas coding ttie inputted 
data In a predetermined coding system when tfie tag discriminating unit determines that the inputted data is not the tag, 
and a special code outputting unit for outputting a special code showing coding of a tag to a decoding side of the tag 
before the inputted data is coded when the tag discriminating unit discriminates ttiat the Inputted data is ttie tag. 
[0025] The present Invention also provides a tag document compressing method for coding a tag document having a 
document type definition defining a tag showing a document structure and a document Instance described using the 
tag defined in the document type definition to decompress the tag document comprising the steps of assigning a pre- 
determined code to the tag in tfie document type definition to create a tag code tag. outputting a special code showing 
coding of a tag to a decoding side of tfie tag when inputted data of tfie document instance is the tag and coding the 
inputted data on the basis of the tag code table, whereas coding the inputted data in a predetermined coding system 
when the inputted data is not the tag. 
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{0026] The tag document compressing apparatus and compressing method according to this invention can compress 
very efficiently not only tags in a teg document but also the document other than the tegs, ft is therefore possSrie to more 
largely decrease a quantity of data of a tag document Further, the decoding side can readily discriminate a teg by the 
above special code. This largely contributes to speeding-up of the teg decoding process. 

5 [0027] The above coding process unit may have a first coding unit for coding the inputted date on the basis of the teg 
code teble, a second coding unit for coding the inputted data in a predetermined coding system, and a switching control 
unit for outputting the inputted date to the first coding nit when the teg discriminating unit determines that the inputted 
date is the teg. whereas outputting the inputted date to the second coding unit when the teg discriminating unit deter- 
mines that the inputted data is not the tag. In such case, the coding process unit can be realized with a simple structure. 

10 [0028] The above teg code table is aeated in such a manner that a teg is stored in the teg storing unit, and information 
on a storing position in the teg storing unit is assigned as a code of the tag. Accordingly, a code is assigned to a teg 
only by successively storing tegs in the teg storing unit. It is therefore possible to create the above teg code table with 
an extremely simple structure, and at a Ngh speed. 

[0029] If the above information on a storing position is information including address information of the above tag stor- 
es ing unit, the tag coding can be performed at a higher speed since the address information of the teg storing unit is used 
as it is as a cod e of a teg . 

[0030] In concrete, if the above information on a storing position is. for example, the above address information and 
information of a length of a teg. the teg coding side can readily specify a teg to be decoded since the length of the teg 
is also assigned as a code of the teg. This largely contributes to speeding-up of the tag decoding process. 

20 [0031 ] Alternatively, the above teg code table may be created in such a manner that a predetermined initial code is 
assigned to a teg extracted by the teg extracting unit to aeate a first coding dictionary, and a code in the first coding 
dictionary is updated according to a frequency of occurrence of a corresponding teg when the tag is coded. Accordingly, 
as the coding of tags is proceeded, a shorter code is assigned to a teg more frequently occurring, for example. This 
largely improves a compression rate of tegs. 

25 [0032] Still alternatively, the above teg code table may be created in such a manner that the frequency of occun-ence 
of a teg in the document instance is counted, and a code according to a result of the counting is assigned to the teg to 
create a second coding dictionary Accordingly, it is possible to assign in advance a short code to a teg frequently occur- 
ring before the teg is coded so as to improve a compression rate of tags and speed up the compressing process. 
[0033] In the above case, the compressing apparatus of this invention may have an occurrence frequency information 

30 outputting unit for outputting information on the frequency of occurrence of the above teg to the decoding side of the : 
tag. whereby the decoding side can readily create the same dictionary as the second coding dictionary. This largely 
improves accuracy of the teg decoding process on the decoding side. 

[0034] The above second coding dictionary creating unit niay have a tag counting unit for determining whether the 
tag extracted by the tag extracting unit coincides with the tag in the document instance to count the frequency of occur- 
35 rence of the tag in the document instance, a code generating unit for generating a code according to a result of the 
counting by the tag counting unit, and a code holding unit for holding the code generated by the code generating unit to 
create the second coding dictionary 

[0035] In the above case, it is possible to readily aeate the second coding dictionary 

[0036] The present invention still further provides a teg document compressing apparatus for coding a tag document 
40 having a document type definition defining a teg showing a document structure and a document instance described 
using the tag defined in the document type definition to compress the teg document con^rising a teg extracting unit for 
scanning the document type definition of an inputted tag document to extract the teg, a tag code table creating unit for 
assigning a predetermined code to the teg in the document type definition on the basis of the tag extracted by the teg 
extracting unit to create a teg code table, a teg discriminating unit for determining whether inputted date in tiie document 
45 instance is tiie tag extracted by the tag extracting unit, and a coding process unit for coding the inputted data on the 
basis of the tag code table when the tag discriminating unit determines that the inputted data is the teg. whereas coding 
the inputted data in a predetermined coding system when the tag discriminating unit determines that the inputted data 
is not the tag. 

[0037] The present invention also provides a tag document compressing method for coding a tag document having a 
50 document type definition defining a teg showing a document structure and a document instance described using the 
tag defined in the document type definition to compress tiie tag document comprising the steps of assigning a prede- 
termined code to the teg to create a tag code table, coding Inputted data in the document instance on the basis of the 
tag code table when the inputted data is the tag, whereas coding the inputted data in a predetermined coding system 
when the inputted data is not the teg. 
55 [0038] According to the teg document compressing apparatus and compressing method of this Invention, a predeter- 
mined code is assigned to a teg in the document type definition to create a tag code table, and inputted date is coded 
on the basis of the above teg code table when tiie inputted data in the document instance is the teg, whereas the input- 
ted data is coded in a predetermined coding system when the inputted date is not the tag. Accordingly, it is possible to 
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more increase a compression rate since no special code is outputted. 

[0039] The above tag discriminating ur« may detect a start-tag showing a start of a tag on the basis of the tao 
extracted by the tag extracting unit to determine that the inputted data is the tag. 

[0040] In the above case, it is possible to discriminate a tag with a simpler structure and at a higher speed, thus the 
tag compressing process can be sped up. 

[0041 1 The present invention still further provides a tag document decompressing apparatus for decoding a coded tag 
document having a document type definition defining a tag showing a document structure and a document instance 
described using the tag defined in the document type definition to decompress tiie coded tag document comprising a 
tag extracting unit for scanning the document type definition of an inputted tag document to extract the tag. a tag decode 
table creating unit for assigning a predetermined code to the tag in the document type definition on the basis of the tag 
extracted by the tag extracting unit to create a tag decode table, and a tag decoding unit for decoding the tag in the 
coded document instance on the basis of the tag decode table created by the tag dec^e tabia creating unit. 
[0042] The present invenSon also provides a tag document decompressing method for decoding a coded tag docu- 
ment having a document type definition defining a tag showing a document structure and a document instance 
described using tine tag defined in the document type definition to decompress the coded tag document comprising the 
steps of assigning a predetermined code to the tag in the document type definition to create a tag decode table, and 
decoding the tag in the coded document instance on the basis of the tag decode table. 

[0043] According to tfie tag document decompressing apparatus and method of this invention, a predetermined code 
is assigned to a tag in the document type definition to create a tag decode table, and the tag in the coded document 
instance is decoded on the basis of the tag decode table. Accordingly, it is possble to decode (decompress) tags in a 
coded tag document very eff icientiy and accurately. 

[0044] When a plurality of tag documents having the same document type definition are decoded, the above tag 
decoding unit may decode tags in the document instances of all of the tag documents on the basis of the tag decode 
table created with respect to flie first tag document by the tag extracting unit and the tag decode table creating unit. 
[0045] In the above case, it is unnecessary to create a tag decode table for each tag document so tiiat the tag decod- 
ing process can be performed at a very high speed. 

[0046] The present invention still further provides a tag document decompressing apparatus for decoding a coded tag 
document having a document type definition defining a tag showing a document structure and a document instance , 
described using the tag defined In tfie document type definition to decompress tiie coded tag document comprising a, 
tag extracting unit for scanning tiie document definition of an inputted tag document to extract the tag, a tag decode^ 
table creating unit for assigning a predetermined code to tiie tag in tiie document type definition on the basis of the tag! 
extracted by the tag extracting unit to create a tag decode table, a special code discriminating unit for determining :, 
whether Inputted coded data is a special code showing inputting of coded data of a tag, and a decoding process;unrt ;i 
for decoding coded data following tiie special code on tiie basts of tiie decode table when the special code discriminatrr 
ing unit delemnlnes tiiat tiie coded data is the special code, whereas decoding the coded data in a predetermined^ 
decoding system when the special code discriminating unit determines that the coded data is not tiie special code. 
[0047] The present Invention also provides a tag document decompressing method for decoding a coded tag docu- 
ment having a document type definition defining a tag showing a document structure and a document instance 
described using tiie tag defined in the document type definition to decompress the coded tag document comprising the 
steps of assigning a predetermined code to tiie tag in tiie document type definition to create a tag decode table, and 
decoding coded data inputted following a special code showing that coded data is inputted on tiie basis of the tag 
decode table when the inputted coded data is tiie special code, whereas decoding the coded data in a predetermined 
decoding system when the inputted coded data is not tiie special code. 

[0048] According to the tag document decompressing apparatus and method of this invention, not only tags but also 
a document other than the tags can be decompressed very efficientiy and accurately. The tag document decompress- 
ing apparatus and method of this invention can also determine whether coded data tiiat is an object of the decompress- 
ing Is a tag or not only by detecting the special code. This largely speeds up the tag decompressing process. 
[0049] In concrete, the above decoding process unit may have a first decoding unit for decoding the inputted coded 
data on the basis of ttie tag decode table, a second decoding unit for decoding ttie inputted coded data in a predeter- 
mined decoding system, and a switching control unit for outputting coded data following tiie special code to the first 
decoding unit when the special code discriminating unit determines that the coded data is the special code, whereas 
outputting the coded data to the second decoding unit when tiie special code discriminating unit determines that tiie 
coded data is not the special code. 

[0050] In the above case, the decoding process may be readily realized in a simple structure. 
[0051 ] Alternatively, the tag decode table aeating unit may have a tag storing unit for storing the tag extracted by the 
tag extracting unit, and assign information on a position in which the tag is stored in tiie tag storing unit as a code of tiie 
tag to create tag decode table. 

[0052] In ttie above case, a code is assigned to each tag only by successively storing tags in the tag storing unit so 
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that the ebove tag decode table can be created with a simple structure and at a high speed. 
[0053] The above information on a storing position may be information including address infonration of the tag storing 
unit. In such case, the tag decoding side can readily letch a tag corresponding to coded data from the tag storing unit 
so long as the tag is coded as information including the address information on the coding side since the address infor- 

5 mation of the tag storing unit is used as it is as a code of the tag. This largely speeds up the tag decoding process. 
[0054] In concrete, if the information on a storing position is the above address information and information on a 
length of a tag, the length of the tag is also assigned as a code of the tag. So long as a tag is coded with the address 
irrformation and the information on a length of the tag on the coding side, it Is possible to fetch a tag oonresponding to 
the coded data from the tag storing unit more accurately. This largely contributes to speeding-up and improvement in 

10 accuracy of the tag decoding process. 

[0055] Still alternatively, the above tag decode table creating unit may have a first decoding dictionary creating unit 
for assigning a predetermined initial code to the tag extracted by the tag extracting unit to create a first decoding dic- 
tionary of the tag as the tag decode table, and a decoding dictionary updating unit for updating the code in the first 
decoding dictionary created by the first decoding dictionary creating unit according to the frequency of occun^ence of a 

15 conresponding tag when the decoding process unit decodes the tag. 

[0056] In the above case, a shorter code is reassigned to a tag more frequently occurring as the decoding of tags is 
proceeded. This largely improves efficiency of the tag decoding. 

[0057] The above tag decode table may be created as a second decoding dictionary in such a manner that a code is 
assigned to a tag in the document instance according to the frequency of occurrence of the tag on the basis of the tag 
20 in the document type definition and Information on the frequency of occurrence of the tag. In such case, a short code 
is in advance assigned to a tag frequently occurring before the tag is decoded so that an efficiency of the tag decoding 
may be improved and the decoding process may be sped up. 

[0058] The present invention still further provides a tag document decompressing apparatus for decoding a coded tag 
document having a document type definition defining a tag showing a document structure and a document instance 

25 desaibed using the tag defined in the document type definition to decompress the coded tag document comprising a 
tag extracting unit for scanning the document type definition of an inputted tag document to extract the tag, a tag decode 
table aeating unit for assigning a predetermined code to the tag in the document type definition on the basis of the tag 
extracted by the tag extracting unit to create a tag decode table, a tag code discriminating unit for determining whether 
inputted coded data is coded data of the tag, and a decoding process unit for decoding the coded data on the basis of 

30 the tag decode table when the tag code discriminating unit determines that the coded data is the tag, whereas decoding 
the coded data in a predetermined decoding system when the code discriminating unit determines that the coded data 
is not the tag. 

[0059] The present invention also provides a tag document decompressing method for decoding a coded tag docu- 
ment having a document type definition defining a tag showing a document structure and a document instance 
35 described using the tag defined in the document type definition to decompress the coded tag document comprising the 
steps of assigning a predetermined code to the tag in the document type definition to create a tag decode table, and 
decoding inputted coded data on the basis of the tag decode table when the inputted coded data is coded data of the 
tag. whereas decoding the inputted coded data in a predetermined decoding system when the inputted coded data is 
not coded data of the tag. 

40 [0060] According to the tag document decompressing apparatus and method of this inverrtion, it is possible to accu- 
rately perform the tag decompressing process while increasing efficiency of the compression on the coding side since 
no special code is received. 

[0061 ] At this time, the tag code discriminating unit may detect a start-tag showing a start of a tag to determine that 
the coded data is the tag. In such case, It is possible to discriminate a tag with a simple structure and at a high speed 

4S SO as to speed up the tag decompressing process. 

[0062] The present invention still further provides a tag document compressing/decompressing apparatus for coding 
tag document having a document type definition defining a tag showing a document structure and a document instance 
described using the tag defined in the document type definition to compress the tag document, and decoding the coded 
tag document to decompress the same comprising a tag extracting unit for scanning the document type definition of an 

so inputted tag document to extract the tag, a tag code/decode table creating unit for assigning a predetermined code to 
the tag in the document type definition on the basis of the tag extracted by the tag extracting unit to create a tag 
code/decode table, a tag coding unit for coding the tag in the document instance on the basis of the tag code/decode 
table created by the tag code/decode table creating unit, and a tag decoding unit for decoding the tag in the document 
instance coded by the tag coding unit on the basis of the tag code/decode table created by the tag code/decode table 

55 creating unit. 

[0063] The present invention also provides a tag document compressing/decompressing method for coding a tag doc- 
ument having a document type definition defining a tag showing a document structure and a document instance 
described using the tag defined in the document type definition to compress the tag document, and decoding the coded 



6 



EP0 896 284A1 



tag document to decompress the same comprising the steps of assigning a predetermined code to the tag in the doc- 
ument type definition to create a tag code/decode tabUe, coding the tag in the document instance on the basis of the tag 
code/decode table, and decoding tfie coded tag on the basis of the tag code/decode table. 
[0064] According to the tag document compressing/decompressing apparatus and method of this invention, a prede- 
5 termined code is assigned to a tag in the document instance to create a tag oode/decode table, and, when the tag is 
decoded, the tag is decoded on the basis of the above tag code/decode table used when the tag Is coded. It is thereby 
unnecessary to create at least a decode table for decoding a tag separately from a code table for coding the tag. This 
largely contributes to speeding-up of the tag decoding (decompressing) process and a decrease of a scale of the appa- 
ratus. 

10 [0065J The present invention still further provides a tag document compressing/decompressing apparatus for coding 
a tag document having a document type definition defining a tag showing a document structure and a document 
instance desaibed using the tag defined in the document type definition to confpress the tag document and decoding 
the coded tag document to decompress the same comprising a tag extracting unit for scanning the document type def- 
inition of an inputted tag document to extract the tag. a tag code/decode table creating unit for assigning a predeter- 

IS mined code to the tag in the document type definition on the basis of the tag extracted by the tag extracting unit to crate 
a tag code^ecode table, a tag discriminating unit for detemilning whether inputted data in the document instance is the 
tag extracted by the tag extracting unit, a coding process unit for coding the inputted data on the tasis of the tag 
code/decode table when the tag discriminating unit determines that the inputted data is the tag, whereas coding the 
inputted data In a predetermined coding system when the tag discriminating unit determines that the inputted data is 

20 not ttie tag, a special code outputting unit for outputting a special code showing coding of a tag before ttie Inputted data 
Is coded when the tag discriminating unit determines that the inputted data Is the tag. a special code discriminating unit 
for determining whether coded data outputted from the coding process unit is the special code, and a decoding process 
unit for decoding coded data following the special code outputted from the coding process unit on the basis of ttie tag 
code/decode table when the special code discriminating unit detemilnes that the coded data Is the special code, 

25 whereas decoding ttie coded data outputted from the coding process unit In a predetermined decoding system when 
the special code discriminating unit determines ttiat the coded data is not the special code. 
[0066] The present invention also provides a tag document compressing/decompressing method for coding a tag doc- 
ument having a document type definition defining a tag showing a document structure and a document instance 
desaibed using ttie tag defined in ttie document type definition to compress ttie tag document, and decoding the coded 

30 tag document to decompress ttie same comprising ttie steps of assigning a predetermined code to ttie tag In ttie docT 
ument type definition to create a tag code/decode table, outputting a special code showing coding of a tag when input: 
ted data in ttie document instance is the tag and coding ttie inputted data on the basis of the tag code/decode table, 
whereas coding ttie inputted data In a predetermined coding system when ttie inputted data is not the tag. and when, 
coded data is decoded, decoding coded data following ttie special code on ttie basis of the tag code/decode table when 

35 the coded data is the special code, whereas decoding the coded data in a predetermined decoding system when ttie 
coded data is not the special code. 

[0067] According to ttie tag document compressing/decompressing apparatus and mettiod of ttiis invention, a prede- 
termined code is assigned to a tag in ttie document instance to create a tag code/decode table, and ttie tag is decoded 
on the basis of ttie tag code/decode table when a special code similar to ttie above is detected in the event of tag decod- 
40 ing. Similarly to the above case, tills largely contributes to speeding-up of ttie tag decoding (decompressing) process 
and a decrease of a scale of the apparatus. With ttie above special code. It Is possible to specify a tag ttiat is an object 
of the decoding and decode tiie tag at a high speed and accurately. 

[0068] The present invention still furttier provides a recording medium readable by a computer storing a tag document 
compressing program for coding a tag document having a document type definition defining a tag showing a document 

45 Structure and a document instance described using the tag defined In the document type definition to compress the tag 
document, characterized by that ttie tag document compressing program makes ttie computer function as a tag exti-act- 
ing unit for scanning the document type definition of an inputted tag document to extract tfie tag. a tag code table ae- 
ating unit for assigning a predetermined code to ttie tag on the basis of tiie tag extracted by ttie tag extracting unit to 
create a tag code table, and a tag coding unit for coding ttie tag In ttie document instance on ttie basis of the tag code 

50 table created by ttie tag code table creating unit. 

[0069] The present invention also provides a recording medium readable by a computer storing a tag document com- 
pressing program for coding a tag document having a document type definition defining a tag showing a document 
structure and a document instance described using the tag defined in the document type definition to compress the tag 
document characterized by that ttie tag document compressing program makes the computer function as a tag extract- 

55 ing unit for scanning the document type definition of an Inputted tag document to extract ttie tag, a tag code table cre- 
ating unit for assigning a predetermined code to the tag in the document type definition on the basis of the tag extracted 
by tiie tag extracting unit to create a tag code table, a tag discriminating unit for determining whettier inputted data in 
the document instance is ttie tag extracted by ttie tag extracting unit, a coding process unit for coding the inputted data 
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on the basis of the tag code table when the tag discriminating unit determines that the inputted data is the tag. whereas 
coding the inputted data in a predetermined coding system when the tag discriminating unit determines that the irput- 
ted data is not the tag, and a special code outputting unit for oulputting a special code showing coding of a tag to a 
decoding side of the tag before the inputted data is coded when the tag discriminating unit determines that the inputted 
5 data is the tag. 

[0070] The present invention still further provides a recording medium readable by a computer storing a tag document 
deconpressing program for decoding a coded tag document having a document type definition defining a tag showing 
a document structure and a document instance described using the tag defined in the document type definition to 
decompress the coded tag document, characterized by that the tag document decompressing program makes the com- 
10 puter function as a tag extracting unit for scanning the document type definition of an inputted tag document to extract 
the tag. a tag decode table creating unit for assigning a predetermined code to the tag in the document type definition 
on the basis of the tag extracted by the tag extracSng unit to create a tag decode iaWe, and a tag decoding unit for 
decoding the tag in the coded document instance on the basis of the tag decode table created by the tag decode table 
creating unit. 

IS [0071] The present invention also provides a recording medium readable by a computer storing a tag document 
decompressing program for decoding a coded tag document having a document type definition defining a tag showing 
a document structure and a document instance described using the tag defined in the document type definition to 
decompress the tag document, characterized by that the tag document decompressing program makes the computer 
function as a tag extracting unit for scanning the document type definition of an inputted tag document to extract the tag. 

20 a tag decode table creating unit for assigning a predetermined code to the tag in the document type definition on the 
basis of the tag extracted by the tag extracting unit, a special code discriminating unit for determining whether inputted 
coded data is a special code showing that coded data of a tag Is inputted, and a decoding process unit for decoding 
coded data inputted following the special code on the basis of the tag decode table when the special code discriminat- 
ing unit determines that the coded data is the special code, whereas decoding the coded data in a predetermined 

25 decoding system when the special code discriminating unit determines that the coded data is not the special code. 
[0072] The present invention still further provides a recording medium readable by a computer storing a tag document 
compressing/decompressing program for coding a tag document having a document type definition defining a tag 
showing a document structure and a document Instance described using the tag defined In the document type definition 
to compress the tag document and decoding the coded tag document to decompress the same, characterized by that 

30 the tag document compressing/decompressing program makes the computer function as a tag extracting unit for scan- 
ning the document type definition of an inputted tag document to extract the tag, a tag code/decode table creating unit 
for assigning a predetermined code to the tag on the basis of the tag extracted by the tag extracting unit to create a tag 
code/decode table, a tag coding unit for coding the tag in the document instance on the basis of the tag code/decode 
table created by the tag code/decode table creating unit, and a tag decoding unit for decoding the tag in the document 

35 instance coded by the tag coding unit on the basis of the tag code/decode table created by the tag code/decode table 
creating unit. 

[0073] The present invention also provides a recording medium readable by a computer storing a tag document com- 
pressing/decompressing program for coding a tag document having a document type definition defining a tag showing 
a document structure and a document instance described using the tag defined In the document type definition to oom- 

40 press the tag document and decoding the coded tag document to decompress the same, characterized by that the tag 
document comprising/decompressing program makes the computer function as a tag extracting unit for scanning the 
document type definition of ah inputted tag document to extract the tag, a tag code/decode table aeating unit for 
assigning a predetermined code to the tag in the document type definition on the basis of the tag extracted by the tag 
extracting unit to create a tag codis/deoode table, a tag discriminating unit for determining whether inputted data of the 

4S document Instance is the tag extracted by the tag extracting unit, a coding process unit for coding the inputted data on 
the basis of the tag code/decode table when the tag discriminating unit determines that the inputted data is the tag, 
whereas coding the inputted data in a predetermined system when the tag discriminating unit determines that the input- 
ted data is not the tag. a special code outputting unit for outputting a special code showing coding of a tag before the 
Inputted data is coded when the tag discriminating unit determines that tfie inputted is one of tfie tags, and a decoding 

50 process unit for decoding coded data following the special code outputted from the coding process unit on the basis of 
the tag code/decode table when the special code discriminating unit determines that tiie coded data is the special code, 
whereas decoding the coded data in a predetermined decoding system when the special code discriminating unit deter- 
mines that the coded data is not the special code. 

[0074] Each of the above tag document compressing apparatus, the tag document decompressing apparatus and the 
55 tag document compressing/decompressing apparatus may be readily realized by storing a compressing program, a 
decompressing program or a compressing/decompressing program in a recording medium readable by a computer, 
and providing the recording medium to a desired computer. This largely improve versatility of this invention, leading to 
a spread of this invention. 
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[0075] In the above, the term 'document instance' refers to the body or substance of the document. 
[0076] In the drawings: 

FIG, 1 is a block diagram showing a computer system to which a compressing apparatus and a decompressing 
apparatus for an SGML document (tag document) according to a first embodiment of this invention are applied; 
FIG. 2 is a block diagram showing a structure of an essential part of a personal computer as the compressing appa- 
ratus for an SGML document according to the first embodiment; 

FIG. 3 is a flowchart for illustrating an operation of the compressing apparatus fbr an SGML document according 
to the first embodiment; 

FIG. 4 is a block diagram showing a structure of an essential part of a personal computer as the decompressing 
apparatus fbr an SGML document according to the first embodiment; 

FIG. 5 is a flowchart for illustrating an operation of the decompressing apparatus fbr an SGML document according 
to the first embodiment; 

FIG. 6 is a block diagram showing a structure of an essential part of a compressing apparatus for an SGML docu- 
ment according to a second embodiment of this invention; 

FIG. 7 is a flowchart for Illustrating an operation of the compressing apparatus fbr an SGML document according 
to the second embodiment; 

FIG. 8 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML doc- 
ument according to the second embodiment of this invention; 

FIG. 9 is a flowchart for illustrating an operation of the decompressing apparatus fbr an SGML document according 
to the second embodiment; 

FIG. 1 0 is a block diagram showing a structure of an essential part of a compressing apparatus for an SGML doc- 
ument according to a third embodiment of this invention; 

FIG. 1 1 is a diagram fbr illustrating an operation of the compressing apparatus fbr an SGML document according 
to the third embodiment; 

FIG. 1 2 is a flowchart for illustrating the operation of the compressing apparatus for an SGML document according 
to the third embodiment; 

FIG. 13 is a diagram showing the operation of the compressing apparatus fbr an SGML document accorting to the 
third embodiment; 

FIG. 1 4 is a block diagram showing a structure of an essential part of a decompressing apparatus fbr an SGML doc- 
ument according to the third embodiment of this invention; 

FIG. 1 5 is a flowchart fbr illustrating an operation of the decompressing apparatus for an SGML document accord- 
ing to the tfiird embodiment; 

FIG. 1 6 is a block diagram showing a modification of the decompressing apparatus fbr an SGML document accord- 
ing to the third embodiment; 

FIG. 1 7 is a block diagram showing a structure of an essential part of a compressing apparatus for an SGML doc- 
ument according to a fburtii embodiment of this invention; 

FIG. 1 8 Is a flowchart for illustrating an operation of the compressing apparatus fbr an SGML document according 
to tiie fourth embodiment; 

FIG. 1 9 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML doc- 
ument according to the fourth embodiment of this invention; 

FIG. 20 is a flowchart fbr illustrating an operation of ttie decompressing apparatus for an SGML document accord- 
ing to the fourth embodiment; 

FIG. 21 Is a block diagram showing a structure of an essential part of a compressing apparatus for an SGML doc- 
ument according to a fiftfi embodiment of this invention; 

FIG. 22 is a block diagram showing a structure of a code creating unit of the compressing apparatus for an SGML 
document according to ttie fifth embodiment; 

FIG. 23 is a flowchart for illustrating an operation of the compressing apparatus fbr an SGML document according 
to the fifth embodiment; 

FIG. 24 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML doc- 
ument according to the fifth embodiment of this invention; 

FIG, 25 is a flowchart for illustrating an operation of ttie deconpressing apparatus fbr an SGML document accord- 
ing to the f iftfi embodiment; 

FIG. 26 Is a block diagram showing a structure of an essential part of a compressing apparatus for an SGML doc- 
ument according to a sixth embodiment of tiiis invention; 

FIG. 27 is a flowchart for illustrating an operation of tiie compressing apparatus for an SGML document according 
to the sixth embodiment; 

FIG. 28 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML doc- 
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ument according to the sixth embodiment of this invention; 

F!Q. 29 is a flowchart for illustrating an operation of the dsconprsssing apparatus for an SGML document accord- 
ing to the sixth embodiment; 

FIG. 30 is a block diagram showing a structure of an essential part of a compressing/decompressing apparatus for 

an SGML document according to an embodiment of this Invention; 

Fia 31 is a diagram schematically showing a format of an SGML document; 

FIG. 32 is a diagram showing an example of description of document type definition (DTD) of an SGML document; 
and 

FIG. 33 is a diagram showing an example of description of document instance of an SGML document, 
(a) Description of a First Embodiment of This Invention 

[0077] FIG. 1 is a block diagram showing a computer system to which a compressing apparatus and a decompressing 
apparatus for an SGML document (tag document) according to a first embodiment of this invention are applied. As 
shown in FIG. 1 , \he system according to ttiis embodiment is configured with personal computers 2 and 3 connected to 
a certain network 6 such as Internet or the like via network connecting apparatus 4 such as modems or TAs (Terminal 
Adapters). 

[0078] Each of the personal computers 2 and 3 has, as shown in FIG. 1 . a personal computer main body 21 , a display 
(display screen) 22. a keyboard 23 and a mouse (pointing device) 24. etc. The user can make the above-described 
SGML document (tag document) witti an editor in ttie personal conputer 2 or 3 ttirough ttie keyboard 23, stores tiie 
made document as a document file in a hard disk (storage apparatus) 27 in the main body 21 tiirough a process by a 
CPU (Central Processing Unit) 26. or provide the made document (that is, transfer the file) to anottier personal compu- 
ter 3 or 2 over the network 6. 

[0079] When the above SGML document Is stored In the hard disk 27 or transferred over ttie network 6 as above, it 
Is desirable tiiat the SGML document is coded, compressed, and stored/transferred in order to save a memory capacity, 
data transmission quantity, data transmission time, then the compressed document is decompressed (decoded) when 
displayed on the display 22 or printed out, since the SGML document is of a large quantity of data. 
[0080] Particularly, in ttie case of a system in which plural kinds of SGML documents are circulated (for example, 
CALS system or the like), portions other than tiie document instance 303 of the SGML document are required to be 
sent each time. By encoding and compressing the SGML document and sending it rattier than sending the SGML doc- 
ument as it is, It is possible to decrease a transmission time, a capacity of a storage apparatus on the transmitter's skJe 
(tiie server's side)/receiver's side (client's side) of the document. 

[0081 ] According to this ent)odiment a oonpression program or a decompression program for the SGML document 
is stored in the hard disk 27. and tiie CPU 26 operates according to the program, whereby tfie personal computer 2 or 
3 (the CPU 26. in conaete) is used as a compressing apparatus which codes and compresses ttie SGML document or 
a decompressing apparatus which decodes and decompresses ttie SGML document having been coded and com- 
pressed. 

[0082] Hereinafter, description will be made on an assumption that the personal computer 2 is a conpressing appa- 
ratus for an SGML document, whereas ttie personal computer 3 is a decompressing apparatus for an SGML document, 
for the sake of convenience. 

[0083] The user can make each of the above programs using ttie personal computer 2 or 3 and store it in the hard 
disk 27, in advance. Alternatively, ttie user can store the program in the hard disk 27 by reading the program stored in 
advance in a recording medium 1 5 in various type such as a floppy disk (FD) 1 1 , a CD-ROM 1 2. an MO (magneto-optic 
disk) 1 3, or ttie like ttirough a disk drive 25. 

(a1) Description of a Compressing Apparatus (Decoding Side) for an SGML Document 

[0084] FIG. 2 is a block diagram showing a structure of an essential part of ttie personal computer 2 as a compressing 
apparatus for ttie above SGML document. As shown in FIG. 2. the personal computer (hereinafter refen'ed as a com- 
pressing apparatus) 2 according to this embodiment has an SGML tag extracting unit 30. a tag code table creating unit 
40, a tag discriminating unit 50 and a tag coding unit 60. 

[0085] The SGML tag extracting unit 30 scans ttie DTD (document type definition) 302 (refer to FIG. 31) of an SGML 
document inputted by reading the SGML document stored as a document file in the hard disk 27 by the CPU 26. for 
example, and extracts tags defined in ttie DTD 302. The tag code table creating unit 40 assigns a predetermined code 
to each of ttie tags in ttie DTD 302 on ttie basis of the tags extracted by ttie tag extracting unit 30 so as to create a tag 
code table. For instance, data other than data assigned to characters (UNICODE, for example) is assigned to ttie codes 
of ttie tags. 

[0086] The tag discriminating unit 50 determines whettier data (a character or a character string) in ttie document 
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instance 303 of the SGML document inputted together with the DTD 302 is a tag or not. (f the inputted data is a tag, the 
tag discriminating unit 50 outputs the data to the tag coding unit 60. if the inputted data is not a tag. the tag discriminat- 
ing unit 50 outputs the data as it is to the outside (the hard disk 27 or the network 6. for example). 
[0087] The tag coding unit 60 codes the tags in the document instance 303 of the SGML document on the basis of 
the tag code table created by the tag code table creating unit 40. Here, the tag coding unit 60 outputs a code In the 
above code table corresponding to the inputted data (tag) from the tag discriminating unit 50 as a code of the tag. 
[0088] In the compressing apparatus 2 with the above structure according to the first embodiment, as shown In FIG. 
3, the SGML tag extracting unit 30 scans the DTD 302 of the SGML document to extract tags (Step A1), and the tag 
code table creating unit 40 assigns a predetermined code to each of the extracted tags to create a tag code table (Step 
A2). When the tag discriminating unit 50 determines that data in the document Instance 303 of the Inputted SGML doc- 
ument is a tag. the tag coding unit 60 codes the data on the basis of the above tag oode table and outputs the coded 
data (Step A3). 

[0089] Assuming here that the SGML tag extracting unit 30 extracts tags (TITLE)and (/TITLE), and the tag code table 
creating unit 40 assigns <TITLEy="00" and (n'iTLE)="10" to the respective tags, so as to aeate a tag code table, for 
example. If 

<TITLE>|6W (%3g)?^»*</TITLE> 

is Inputted at this time as the document instance 303. for example, the tag discriminating unit 50 first determines that 
(TITLE } is a tag so as to output the tag to the tag coding unit 60. The tag coding unit 60 obtains a code "00" conespond- 
ing to (TITLE ) by referring to the above tag code table on the basis of the inputted tag (CTITLE )), and outputs "00" as a 
code of (TITLE). 

[0090] The tag discriminating unit 50 secondary determines whether data Inputted following the above tag ( (TITLE )) 
is a tag or not. Following to the above (TITLE) Is 

so that the tag discriminating unit 50 determines that the Inputted data is other than a tag so as to output the Inputted 
data as it is, not coding the Input data. 

[0091 ] After that, the tag discriminating unit 50 further determines whether inputted data is a tag or not. Here, following 
to the above 

is (/TITLE ) (an end-tag) so that the tag disalminating unit 50 outputs the tag to the tag coding unit 60. The tag coding 
unit 60 obtains a code "10" corresponding to (/TITLE ) by referring to the above tag code table on the basis of the input- 
ted tag ((TTITLE)). and outputs "10" as a code of (/TITLE). 

[0092] As a result, in the above document instance 303, only the tags are coded and compressed as 

"00 m m { ^ ^ ) m m m ics 

and outputted, finally According to this embodiment, the DTD 302 is not coded, thus outpulted as it is. 
[0093] According to this embodiment, the compressing apparatus 2 for an SGML document assigns a predetermined 
code to each of tags in the DTD 302 to create a tag code table, and codes tags in the document Instance 303 on the 
basis of the tag code table. It is thereby possible to compress tags frequently used, in general, in the SGML document 
very efficiently, thus largely decreasing a quantity of data in the SGML document. 

[0094] Therefore, not only a memory capacity used to store an SGML document is saved, but also a transmission 
quantity of data and a transmission time of data at the time of transmission of the SGML document over the network 6 
are largely decreased. (a2) Description of a Decompressing Apparatus (Decoding Side) for an S(3ML Document 
[0095] FIG. 4 is a block diagram showing a structure of an essential part of the personal computer 3 as a decompress- 
ing apparatus for the above SGML document. The personal computer (hereinafter, refened as a decompressing appa- 
ratus) 3 shown in FIG. 4 is to decompress (decode) the SGML document coded (compressed) by the above 
compressing apparatus 2 shown in FIG. 2. According to this en^iment. the decompressing apparatus 3 has an 
SGML tag extracting unit 30\ a tag decode table creating unit 40', a tag discriminating unit 50* and a tag decoding unit 
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60'. 

[0096] The SGML tag extracting unit 30' scans the DTD 302 (not codK!) inputted frojTi the compressing apparatus 2 
over, for example, the network 6 to extract tags defined in the DTD 302. The tag decode table creating unit 40' assigns 
a predetermined code to each of the tags in the DTD 302 on the basis of the tags extracted by the tag extracting unit 

5 30' to create a tag decode table. 

[0097] The tag discriminating unit 50' determines whether data in the document instance 303 of the SGML document 
in which only tags have been coded on the coding side inputted together with the DTD 302 is a tag or not. If the inputted 
data is a tag, the tag discriminating unit 50' outputs the coded data to the tag decoding unit 60'. If the inputted data is 
other than a code of a tag, the tag discriminating unit 50' outputs the inputted data as it is to the outside (the hard disk 

10 27. for example). For instance, if data other than data assigned to characters (UNICX)DE, for example) is assigned to 
codes on the coding side, it is possible to detect a code of a tag by detecting data other than characters. 
10098] The tag decoding unit 60' decodes the tags in the coded document instance 303 on the basis of the tag decode 
table created 'oy the tag decode table aeating unit 40'. Here, the tag decoding unit 60' outputs a tag in the above decode 
table corresponding to the data (a code of the tag) inputted from the tag discriminating unit 50' as a result of the decod- 

15 Ing. 

[0099] In the decompressing apparatus 3 with the above structure according to the first embodiment, as shown in 
FIG, 5, the SGML tag extracting unit 30' scans the DTD 302 of the SGML document to extract tags (Step B1), and the 
tag decode table creating unit 40' assigns the same code as the coding side to each of the extracted tags to create the 
tag decode table (Step B2) . When the tag discriminating unit 50' determines that the data In tie document instance 
20 303 of the inputted SGML document is a code of a tag, the tag decoding unit 60* decodes the data on the basis of the 
above tag decode table to obtain the tag and outputs the tag (Step B3). 

[0100] Assuming here that tag extracting unit 30' and the tag decode table creating unit 40' create a tag decode table 
in which codes are assigned to respective tags, as (TITLE )="00''. (/TrrLE)="10", for exanple. as well as the coding 
side. If 

25 

having been coded on the coding side is inputted as inputted data at tfiis time, for example, the tag discriminating unit 
30 50' determines that "00" is a code of a tag so as to output ttie coded data to tfie tag decoding unit 60'. 

[0101 ] The tag decoding unit 60' obtains a tag (TITLE ) corresponding to "00" by referring to tfie above tag decode 
table on ttie basis of ttie inputted code "00" of the tag, and outputs (TITLE ) as a result of the decoding of the code "00". 
[0102] The tag discriminating unit 50' then determines whether inputted data following the above "00" is a code of a 
tag or not. Here, following the above "00" is 

35 

" JS ^ ( # ^ ) iKB § 

so that the tag discriminating unit 50' determines tiiat the irputted data is otiier than a tag, thus outputs ttie data as it 
40 is. not decoding tiie coded data. 

[0103] After that, the tag discriminating unit 50' determines whether following tiie inputted data is a code of a tag or 
not. Here, following the above 

is a code of a tag "10" so tfiat the tag discriminating unit 50* outputs ttie code of tiie tag to ttie tag decoding unit 60'. The 
tag decoding unit 60' obtains a tag ((n"ITLE )) corresponding to the code "10" by referring to tiie above tag decode table 
on the basis of the code "10" of the inputted tag, and outputs VTMIE > as a resuft of the decoding of the code "10". 
50 [01 04] As a result, the document instance 303 of ttie SGML document in which only tags have been coded is decoded 
to ttie original state as 

"<TITLE>«W (#3^) mmW</TITLE>'\ 

55 

and outputted. 

[0105] According to ttiis embodiment, the decompressing apparatus 3 for the SGML document assigns the same 
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code as the coding side to each of tags in the DTD 302 to create the tag decode table, and decodes tags in the docu- 
ment instance 303 of the coded SGML document on the basis of the tag decode table. It is thereby possible to decode 
(decompress) tags in the SGML document very efficiently and con'ectly 

5 (b) Description of a Second Embodiment 

(b1) Description of a Compressing Apparatus (Coding Side) for an SGML Document 

[0106] FIG. 6 is a block diagram showing a structure of an essential part of a compressing apparatus for a tag docu- 
10 ment according to a second embodiment of this invention. A compressing apparatus 2 shown in FIG. 2 additionally has 
a DTD comparing unit 70 and controller 80, as compared with the compressing apparatus 2 shown in FIG. 2. 
[0107] The above DTD comparing unit 70 compares a DTD 302 of an SGML document newly inputted with a DTD 
302 of a past SGML document inputted immediately before the DTD 302 of the newly inputted SGML document, and 
outputs an agreement/disagreement signal for each pair of the DTDs 302 to the controller 80. According to this embod- 
15 iment, the DTD comparing unit 70 successively holds an inputted DTD 302. and compares it with a newly inputted DTD 
302. 

[0108] The controller 80 controls a code table creating process by the tag code table creating unit 40 according to the 
agreement^disagreement signal from the DTD comparing unit 70. When receiving the agreement signal for DTDs 302 
from the DTD comparing unit 70, the controller 80 directs the tag code table creating unit 40 to maintain the tag code 
20 table created in the past. When receiving the disagreement signal for the DTDs 302, the controller 80 directs the tag 
code table creating unit 40 to update the tag code table. 

[0109] The tag code table creating unit 40 according to this entKXliment maintains a tag code table created with 
respect to the first document among a plurality of documents while SGML documents having the same DTD 302 are 
inputted. When an SGML document having a different DTD 302 is inputted, the tag code table creating unit 40 assigns 
25 a predetermined code to each of tags extracted from the DTD 302 by the SGML tag extracting unit 30 to recreate a tag 

code table, as well as the first embodiment. 

[01 1 0] Next, description will be made of an operation of the compressing apparatus 2 with the above structure accord- 
ing to the second embodiment referring to a flowchart (Steps CI through C4) shown in FIG. 7. When a DTD 302 is 
newly inputted to the compressing apparatus 2, the conpressing apparatus 2 conrpares the DTD 302 with a DTD 302 

30 inputted in the past by the DTD comparing unit 70 (Step CI). If the comparison results in that the DTDs 302 do not 
agree with each other (if NO at Step CI), the DTD comparing unit 70 outputs the disagreement signal to the controller 
80. while outputling the above newly inputted DTD 302 to the SGML tag extracting unit 30. 4 
[Oil 1] The SGML tag extracting unit 30 scans the received DTD 302 to extract tags defined in tfie DTD 302 (Step ;5 
C2). and outputs the extracted tags to the tag code table creating unit 40. Since the disagreement signal is outputted 

35 from the DTD comparing unit 70 to tiie controller 80 at tNs time as above, the tag code table creating unit 40 receives 
a direction to update the tag code table from the controller 80 so as to assign a predetermined code to each of the tags 
extracted by the SGML tag extracting unit 30 and re-create a tag code table (Step C3). 

[01 1 2] At this time, the document instance 303 of the SGML document inputted together v\rith the DTD 302 is Inputted 
to the tag discriminating unit 50. When the Inputted document instance 303 is a tag. the tag discriminating unit 50 out- 

40 puts the tag to the tag coding unit 60. The tag coding unit 60 obtains a code corresponding to the received tag from the 
tag code table created by the tag code table creating unit 40, and outputs tiie code as a code of tiie tag (Step C4). 
[01 1 3] If the comparison by the above DTD conparing unit 70 results in that the DTDs 302 agree with each other (if 
YES at Step CI ), the DTD comparing unit 70 outputs the agreement signal to the controller 80. The controller 80 directs 
the tag code table creating unit 40 to maintain (not update) the tag code table. TTie tag coding unit 60 thereby codes 

45 tags In the document instance 303 on the basis of the tag code table created in the past, similarly to the above case 
(Step C4). 

[01 1 4] The compressing apparatus 2 for an SGML document according to this embodiment codes tags in the docu- 
ment Instances 303 of all SGML documents having the same DTD 302 on tiie basis of a tag code table created with 
respect to the first document among them. It is therefore unnecessary to create a tag code table for each SGML docu- 
50 ment so that the compressing apparatus 2 can perform the tag coding process at an extremely high speed. 

[01 15] Meanwhile, there is a case, depending on an environment in which SGML is used, where it is already estab- 
lished between a provider (server) and a receiver (client) of a document what kind of DTD 302 is used in SGML docu- 
ments to be sent In such case, it is unnecessary to hand over portions other than the document instance 303 to tiie 
receiver each time. 

55 [01 1 6] In the case where a format of the DTD 302 to be used Is unified In advance and the DTDs 302 of all document 
are the same such as documents in the HTML format used in WWW in Internet, a tag code table first created by tiie tag 
code table creating unit 40 is fixedly used under a control of the controller 80, whereby the tag coding process can be 
performed at a higher speed. 
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[01 17] In the above embodiment, the controller 80 directly controls a creating process of a tag code table in the tag 
code table creating unit 40 so as to maintain/update the tag code table. It is alternatively possible that the controller 80 
maintains/updates the tag code table by controlling an extracting process in the SGML tag extracting unit 30 (allow- 
ing/inhibiting extraction of tags according to a result of comparison of DTDs 302). 

5 

(b2) Description of a Decompressing Apparatus (Decoding Side) for an SGML Document 

[0118] FIG. 8 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML 
document according to the second embodiment of this invention. A decompressing apparatus 3 shown in FIG. 8 corre- 
10 spends to a decoding side of the compressing apparatus 2 described above with reference to FIGS. 6 and 7, which 
additionally has a DTD comparing unit 70' and a controller 80' as compared with the decompressing apparatus shown 
in FIG. 4, which are similar to those above described with reference to FIG. 6< 

[0119] In the decompressing apparatus 3 for an SGML document according to this embodiment, the tag decoding unit 
60' decodes coded tags on the basis of a tag decode table created with respect to the first document among a plurality 
15 of documents by the tag decode table creating unit 40' while SGML documents having the same DTD 302 are inputted, 
as well as the coding side When an SGML document having a different DTD 302 is inputted, the tag decode table cre- 
ating unit 40' re-creates a tag decode table, and the tag decoding unit 60' decodes tags on the basis of the re-aeated 
tag decode table. 

[01 20] Next, detailed desaiption will be made of the above operation with reference to a flowchart (Steps D1 through 
20 D4) shown in FIG. 9. When a DTD 302 Is newly inputted to the decompressing apparatus 3, the DTD comparing unit 
70' compares the newly inputted DTD 302 with a DTD 302 inputted in the past (St^ D1). If the comparison results In 
that the DTDs 302 do not agree with each other (if NO at Step D1), the DTD comparing unit 70' outputs the disagree- 
ment signal to the controller 80', while outputting the newly inputted DTD 302 to the SGML tag extracting unit 30'. 
[0121] The SGML tag extracting unit 30* scans the received DTD 302 to extract tags defined in the DTD 302 (Step 
25 02), and outputs the extracted tags to the tag decode table creating unit 40'. Since the disagreement signal is outputted 
from the DTD comparing unit 70' to the controller 80' at this time, the tag decode table creating unit 40' receives a direc- 
tion to update the tag code table from the controller 80\ Therefore, the tag decode table creating unit 40' assigns a pre- . 
determined code to each of the tags extracted by the SGML tag extracting unit 30' so as to re-create the tag decode . 
table (Step D3). 

30 [0122] The document instance 303 of the coded SGML document inputted together with the DTD 302 is inputted to 
the tag discriminating unit 50'. When a code of the inputted document instance 303 is a tag. the tag discriminating unit 
50' outputs the code to the tag decoding unit 60'. The tag decoding unit 60' obtains a symbol (tag) corresponding to the 
received code from the tag decode table created by the tag decode table creating unit 40*. and: outputs the tag as a 
result of the decoding (Step D4). 

35 [0123] If the above comparison by the DTD comparing unit 70' results in that the DTDs agree with each other (if YES 
at Step D1). the DTD comparing unit 70' outputs the agreement signal to the controller 80'. The controller 80* directs 
the tag decode table creating unit 40* to maintain (not update) the tag decode table. The tag decoding unit 60' decodes 
the coded tags in the document instance 303 on the basis of the lag decode table created in the past, similarly to the 
above (Step D4). 

40 [0124] The decompressing apparatus 3 for an SGML document according to this embodiment decodes tags in the 
document instances 303 of all SGML documents on the basis of a tag decode table created with respect to the first 
SGML documisnt among a plurality of SGML documents having the same DTD 302. It is therefore unnecessary to cre- 
ate a tag decode table for each SGML document so that the decompressing apparatus 3 can perform the tag decoding 
process at an extremely high speed. 

45 [01 25] In the case where a format of the DTD 302 to be used is unified in advance and the DTDs 302 of all documents 
are the same such as documents in the HTML format, the above decompressing apparatus 3 fixedly uses a tag decode 
table first created by the tag decode table creating unit 40' under a control of the controller 80' so as to perform the tag 
decoding process at a higher speed. 

[0126] In the above embodiment, the controller 80* directly controls the creating process of the tag decode table in 
50 the tag decode table creating unit 40', thereby maintaining/updating the tag decode table. However, it is alternatively 
possible that the controller 80' controls the extracting process in the SGML tag extracting unit 30' (permits/inhibits 
extraction of tags according to a result of comparison of DTDs 302) so as to maintain/update the tag decode table. 

(c) Description of a Third Embodiment 

55 

(c1) Description of a Compressing Apparatus (Coding Side) for an SGML Document 

[0127] FIG. 10 is a block diagram showing a structure of as essential part of a compressing apparatus for an SGML 
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document according to a third embodiment of this invention. As shown in FiG. 10. a compressing apparatus 2 for an 
SGML document according to the third embodiment has an SGML tag extracting unit 100, a memory 101 , an SGML tag 
detecting unit 102, a coding process unit 103a and a COC outputting unit 106. 

[0128] The SGML tag extracting unit 100 scans the DTD 302 (refer to FIG. 31) of an inputted SGML document to 
extract tags defined in the DTD 302. The memory (tag storing unit) 101 fulfils a function as a tag code table creating 
unit. The memory 101 successively stores tags extracted by the SGML tag extracting unit 100. and assigns address 
information and length information on a tag in the memory 1 01 to each of the tags as a code of the tag. thereby creating 
a tag code table. 

[01291 When a document shown in FIG. 1 1 is inputted as the document instance 303 (one character in the document 
is assumed to be one byte), for example, tags such as TITLE", "/TITLE". "SECTION". "/SECTION", "SUBSECTION", 
"/SUBSECTION", etc. extracted by the SGML tag extracting unit 100 are successively stored at an address "00" and 
the following addresses of the memory 1 01 . Accordingly, "0005" obtained by combining a "00" address with "05' repre- 
senting a length of the tag (5 bytes) is assigned to fTITLE), and "0c07" obtained by combining a "Oc(HEX)" address 
with "07" representing a length (7 bytes) of the tag is assigned to (SECTION). 

[0130] The SGML tag detecting unit (tag discriminating unit) 1 02 determines whether data of the document instance 
303 of the inputted SGML document is one of tags extracted by the SGML tag extracting unit 1 00 or not, thereby detect- 
ing a tag used in the document instance 303. According to this embodiment, by determining whether data of the input- 
ted document instance 303 (hereinafter, referred as document instance data, occasionally) coincides with a tag stored 
in the memory 101 . the tag is detected, 

[0131] When the above SGML tag detecting unit 102 determines that the above inputted data is a tag, the coding 
process unit 103a codes the inputted data on the basis of stored contents in the memory 101 created as a tag code 
table. When the above SGML tag detecting unit 102 determines that the above irputted data is not a tag, the coding 
process unit 103a codes the inputted data in a predetermined coding system (universal coding system or the lil®). 
[0132] The above coding process unit 103a therefore has a tag coding unit 103. a second coding unit 104 and a 
switching control unit 105, as shown in FIG. 10. 

[0133] The tag coding unit (first coding unit) 103 codes inputted data on the basis of the above tag code table (stored 
contents of the memory 101). The second coding unit 104 codes inputted data in a predetemiined coding systemrsuch . 
as universal coding system or the lil<e. The switching control unit 105 outputs the inputted data to the tag coding unit . 
103 when the SGML tag detecting unit 102 determines that the inputted data is a tag. When the SGML tag detecting 
unit 102 determines that the inputted data is not a tag, the switching control unit 105 outputs the inputted data to the. 
second coding unit 104. ^ 
[0134] When the coding of the tags is completed, the above tag coding unit 103 notifies the SGML tag detectingiunitli^ 
1 02 of it When receiving the notification, the SGML tag detecting unit 102 again performs the tag detecting process onl 
the next document instance data. 

[01 35J When the SGML tag detecting unit 1 02 determines that the above inputted data is a tag. the COC outputting 
unit (special code outputting unit) 106 outputs a special code (COC: Change Of Coding) representing coding of a tag 
(switching the coding system) to a decoding side of the tag described later before the inputted data is coded in the tag 
coding unit 103. 

[0136] Next, detailed description will be made of an operation of the compressing apparatus 2 for an SGMLdocument 
with the above structure according to the third embodiment with reference to a flowchart (Steps El through E6) shown 
in FIG. 12. 

[0137] The compressing apparatus 2 scans the inputted DTD 302 by the SGML tag extracting unit 100 to extracts 
tags defined in the DTD 302, successively stores the extracted tags in the memory 101 . and assigns address informa- 
tion and length Information of the memory 101 to each of the tag as a code of the tag to create a tag code table (Steo 

El). 5* \ K 

[01 38] The compressing apparatus 2 determines whether the inputted document instance data is a tag or not by the 
SGML tag detecting unit 102 (Step E2). If the inputted document instance data is a tag, the compressing apparatus 2 
directs the COC outputting unit 106 to output COC, while directing the switching control unit 105 of the coding process 
unit 103a to switch output of the document instance data to the tag coding unit 103. whereby the COC outputting unit 
106 outputs COC to the decoding side to be described later (from YES route at Step E2 to Step E3). The tag coding 
unit 103 refers to the memory 101 on the basis of the inputted data (tag), and outputs a code (address and length) cor- 
responding to the tag as a code of the tag (Step E4). 

(01 39] If the document instance data which is an object of the coding is not a tag at the above Step E2, the compress- 
ing apparatus 2 directs the switching control unit 105 to switch the output of the document instance data to the second 
coding unit 103 so that the second coding unit 104 codes the document instance data (character or character string) in 
a predetermined coding system (from NO route at Step E2 to Step E5). 

[0140] The compressing apparatus 2 determines whether the coding is completed or not (Step E5). If the coding is 
not completed (if some of the document instance data still remain), the compressing apparatus 2 repeats the process 



15 



EP0896284A1 



from the above Step E2 until the coding is completed (NO route at Step E6). If the coding is completed, the compressing 
apparatus 2 terminates the compressing process (YES route at Step E6}. 
[0141 ] Assuming here, as shown in FIQ. 1 3, that 

^ H l4<B>lltiX</B>^i-o " 

is inputted as document instance data (Step F1), codes "0" and "1" are assigned to tags <B) and VB). respectively, a 
tag code tabi e 1 01 a is created, and codes shown in FIQ. 1 3 are assigned to respective characters other than these tags 

10 (that is. a code table 104a for the second coding unit 104 Is created). 

[0142] In the above document instance data. COC ("1 0") is inserted before each of the tags (B )and VB >, after that, 
each of the tags is coded on the basis of the tag code table 101 a by the tag coding unit 103 (Step F2). Tlie characters 
other than the tags are coded on the basis of the code table 104a t}y the second coding unit 104. 
[0143] As a result, the above document instance data is finally coded into codes TOeTbZeZb" in hexadecimal notation 

IS (HEX), or "11111/11110/0111/10/0/11110/1100/10/1/1101/0110/0 10" in binary notation, as shown in FIG. 13 (Step 
F3). 

[0144] The conpressing apparatus 2 for an SGML document according to the third embodiment of this invention out- 
puts COC to the tag decoding side, and codes Inputted data on the basis of a tag code table by the tag coding unit 103 
when the inputted document instance data is a tag. When the document instance data is not a tag, the second decoding 
20 unit 1 04 codes the document instance data in a predetermined coding system. It is therefore possible to compress very 
efficiently not only ta^ in an SGML document but also the document other than the tags so as to decrease a quantity 
of data of the SGML document much more. 

[0145] Since the COC outputting unit 106 outputs COC to the decoding side, the tag decoding side can readily dis- 
criminate a tag, as will be described later. This largely contributes to speeding-up of the decoding process. Incidentally. 
25 the COC outputting unit 106 may be omitted if the process on the decoding side is not taken into account. 

[0146] Since the coding process unit 1 03a has the tag coding unit 103, the second coding unit 1 04 and the switching 
control unit 105 according to this embodiment; the function of the coding process unit 103a can be realized with a sim- 
ple structure. 

[0147] Since the memory 101 as the tag code table creating unit of this embodiment assigns information on an 
30 address and a length of a tag in the memory 101 as a code of the tag to create the tag code table, a code is assigned 
to each tag only by successively storing tags in the memory 101. It is therefore possible to create the tag code table 
with such a simple structure that only one memory 101 is provided, and at a high speed. 

[0148] As will be described later, the tag decoding side can readily specify a tag to be decoded on the basis of the 
address and the length. This largely contributes to speeding-up of the tag decoding process. 
35 [0149] A code to be assigned to a tag Is not necessarily infbmiatlon on the above address and length, but any infor- 
mation is applicable so long as it includes at least address information. 

(c2) Description of a Decompressing Apparatus (Decoding Side) for an SGML Document 

40 [0150] FIG. 14 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML 
document according to a third embodiment of this Invention. A decompressing apparatus 3 shown in FIG. 14 corre- 
sponds to the decoding side of the compressing apparatus 2 described above with reference to FIGS. 10 through 13. 
which has an SGML tag extracting unit 200. a memory 201 , a COC discriminating unit 202 and a decoding process unit 
203a. 

45 [0151] The SGML tag extracting unit 200 scans the DTD 302 (refer to FIG. 31) of an inputted SGML document to 
extract tags defined in the DTD 302. The memory 201 fulfils a function as the tag decode table creating unit. The mem- 
ory 201 successively stores the tags extracted by the SGML tag extracting unit 200, assigns address information and 
length information on a tag in the memory 201 as a code of the tag so as to create the tag decoding table as in the case 
of the coding skle. 

50 [0152] The COC discriminating unit (special code discriminating unit) 202 determines whether the inputted coded 
data is COC representing that coded data of a tag is inputted. When the COC discriminating unit 202 determines that 
the inputted coded data is COC, the decoding process unit 203a decodes coded data (i.e.. a code of a tag) following 
the COC on the basis of the tag decode table. When the COC discriminating unit 202 determines that the inputted 
coded data is not COC. the decoding process unit 203a decodes the coded data in a predetermined decoding system. 

55 [01 53] The above decoding process unit 203a has. as shown in FIG. 14, a tag decoding unit 203, a second decoding 
unit 204 and a switching control unit 205. 

[01 54] The tag decoding unit (first decoding unit) 203 decodes the inputted coded data on the basis of stored contents 
of the memory 201 created as the above tag decode table. The second decoding unit 204 decodes the inputted coded 
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data in a predetermined decoding system. In this case, the second decoding unit 204 performs the decoding process 
in a decoding system con'esponding to the coding system on the coding side. 

[0155] When the COC discriminating unit 202 determines that the inputted coded data is COC. the switching control 
unit 205 outputs coded data inputted following the COC to the tag decoding unit 203. When the COC discriminating unit 
202 determines that the inputted coded data Is not COC, the switching control unit 205 outputs the coded data to the 
second decoding unit 204. 

[01 56] Next, detailed description will be made of an operation of the decompressing apparatus 3 for an SGML docu- 
ment with the above structure according to the third embodiment with reference to a flowchart (Steps G1 through 05) 
shown in FIG. 15. 

[0157] The decompressing apparatus 3 scans the inputted DTD 302 by the SGML tag extracting unit 200 to extract 
tags defined in the DTD 302, successively stores the extracted tags in the memory 201. assigns address information 
and length information on a tag in the memory 201 to each of the tags as a cods of the tag, thereby creating a tag 
decoding teble having the same stored contents as the coding side (Step G1). 

[0158] The deconpressing apparatus 3 determines whether the inputted coded data is COC or not by the COC dis- 
criminating unit 202 (Step G2). if the inputted coded data is COC. the decompressing apparatus 3 instructs the switch- 
ing control unit 205 of the decoding process unit 203a to switch output of the coded data to the tag decoding unit 203. 
The tag decoding unit 203 refers to tiie memory 201 on the basis of coded data (a code of a tag; address and length) 
following the COC. and outputs a symbol (tag) corresponding to the coded data as a result of the decoding (Step G3). 
[01 59] When the discrimination at the above Step G2 results in that the coded data which is an object of the decoding 
is not COC, the decompressing apparatus 3 directs the switching control unit 205 to switch the output of the coded data 
to the second decoding unit 204, so that the second decoding unit 204 decodes the coded data (character or character 
string) in a decoding system conesponding to the coding system on the coding side (from NO route at Step G2 to Step 
G4). 

[0160] The deconpressing apparatus 3 determines whether the decoding is completed or not (Step G5). If the decod- 
ing is not completed (if some of the coded data still remain), the decompressing apparatus 3 repeats tiie process from 
the above Step G2 until the decoding is completed (NO route at Step G5). If the decoding is completed, tiie decom- 
pressing apparatus 3 tennlnales the decoding process (YES route at Step G5). 

[01 61 ] In the decompressing apparatus 3 for an SGML document according to the third embodiment, the tag decoding 
unit 203 decodes coded data following COC on the basis of the tag decode table when tiie inputted coded data is the 
COC. If the inputted coded data is not COC, the second decoding unit 204 decodes the coded data in a decoding^ys- 
tem con^esponding to the coding system on the coding side. It is therefore possible to decompress not orily tags but also 
a coded document other than the tags, very efficiently and accurately. 

[0162] Whether or not coded data that is an object of the decoding is a tag can be detemiined only by detecting COC 
so that the tag decoding process can be performed at a very high speed. wt. 
[0163] Since the decoding process unit 203a of this embodiment has tiie tag decoding unit 203, the second decoding 
unit 204 and the switching control unit 205. a function of the decbcGng process unit 203a can be readily realized with a 
simple structure. 

[0164J Since the memory 201 as tiie above tag decode table creating unit assigns address information and length 
information on a tag in the memory 201 to a tag as a code of tiie tag to create the tag decode table, a code is automat- 
ically assigned to each tag only by successively storing tags in the memory 201 so that the tag decode table having tiie 
same contents as tiie coding side is created. It is therefore possible to perform the decoding process of tags at a high 
speed and accurately, even with an extremely simple st-ucture. 

[0165] According to tills embodiment, address information and lengtii information in tiie memory 201 are used as they 
are as a code of a tag. So long as the tag is coded as a code consisting of address information and length information 
on tiie coding side, it is possible to readily fetch a tag corresponding to coded data of the tag from the memory 201 . This 
largely contributes to speeding-up of the tag decoding process. 

[0166] A code to be assigned to a tag is not necessarily information on address and length as above. Any information 
including at least address information is applicable so long as it corresponds to that on the coding side. 
[01 67] The above decompressing apparatus 3 switches between decoding of a tag and decoding of a character (char- 
acter string) otiier than tags at a timing of COC detection. However, if codes are assigned in such a manner tiiat codes 
of characters (strings) other than tags do not coincides witti codes of the tags, an SGML discriminating unit 202' for 
determining whether inputted coded data is a tag or not is provided instead of the above COC discriminating unit 202, 
as shown in FIG. 1 6, for example, thereby switching the decoding of a character (string) otiier tiian tags to the decoding 
of a tag at a timing of detection of the tag itself. 



17 



EP0896284A1 



(d) Description of a Fourth Embodiment 

(d1) Description of a Conrf)ressing Apparatus (Coding Side) for an SGML document 

5 [0168] FIG. 17 is a block diagram showing a structure of an essential part of a compressing apparatus fbr an SGML 
document according to a fourth embodiment of this invention. As shown in FIG. 1 7, a compressing apparatus 2 for an 
SGML document according to the fourth embodiment has a dictionary creating unit 107 and a dictionary updating unit 
108 as the tag code table creating unit 101\ instead of the memory 101 shown in FIG, 10. 
[0169] The dictionary aeating unit (first coding dictionary creating unit) 107 assigns a predetermined initial code to 

10 each of tags extracted by the SGML tag extracting unit 1 00 so as to create a dictionary of the tags (statistical dynamic 
dictionary: first coding dictionary) as a tag code table. The dictionary updating unit (coding dictionary updating unit) 108 
updates a code in the dictionary created by the dictionary creating unit 1 07 according to the frequency of occurrence of 
a tag when the tag is coded by the coding process unit 103a (tag coding unit 103). According to this embodiment, a 
shorter code (a code having a length inversdy proportional to ttie frequency of occurrence) is assigned to a tag more 

15 f requentiy occurring. 

[0170] The compressing apparatus 2 fbr an SGML document according to the fourth embodiment updates the dic- 
tionary (code table) used when a tag is coded in consideration of the frequency of occunrence of the tag. and codes tiie 
tag. 

[0171 ] Next, detailed description will be made of an operation of ttie compressing apparatus 2 for an SGMLdocument 
20 witti the above structure according to ttie fburtti embodiment witti reference to af iowchart (Steps HI through H8) shown 
In FIG. 18. 

[01 72] The compressing apparatus 2 scans the inputted DTD 302 by the SGML tag extracting unit 100 to extract tags 
defined in ttie DTD 302 (Step HI), and outputs the tags to ttie dictionary creating unit 1 07 of the tag code table creating 
unit 1 01 \ The dictionary creating unit 1 07 successively assigns a predetermined initial code to each of ttie inputted tags 

25 so as to create a tag code table (Step H2). 

[01 73] The compressing apparatus 2 determines whettier data of the document instance 303 inputted togettier witti 
the above DTD 302 is a tag or not by ttie SGML tag detecting unit 102 (Step H3). If ttie data is a tag, ttie compressing 
apparatus 2 directs the COC outputting unit 106 to output COC, while directing ttie switching control unit 105 of the cod- 
ing process unit 103a to switch output of ttie document instance data to the tag coding unit 103. 

30 [0174] The COC outputting unit 106 outputs COC to ttie decoding side described later (from YES route at Step H3 to 
Step H4), The tag coding unit 103 refers to ttie dictionary (tag code table) created by ttie dictionary creating unit 107 on 
the basis of the inputted data (tag), and outputs a code corresponding to the tag as a code of the tag (Step H5). 
[0175] The compressing apparatus 2 calculates tfie frequency of occurrence of ttie tag coded by ttie tag coding unit 
1 03. re-assigns a code according to a result of ttie calculation (a code shorter tiian the initial code) to the coded tag to 

35 update the dictionary by the dictionary updating unit 1 08 (Step H6). 

[0176] If the document instance data ttiat is an object of ttie coding is not a tag at the above Step H3, ttie compressing 
apparatus 2 directs the switching control unit 1 05 to switch ttie output of ttie.document instance data to ttie second cod- 
ing unit 104. The second coding unit 104 codes the document instance data (character or character sb-ing) in a prede- 
termined coding system (from NO route at Step H3 to Step H7). 

<o [0177] The compressing apparatus 2 determines whettier ttie coding is completed or not (Step H8). If the coding is 
not completed (if some of ttie document instance data still remain), the compressing apparatus 2 repeats ttie process 
from the above Step H3 until ttie coding is completed (NO route at Step H8): if the coding is completed, ttie compress- 
ing apparatus 2 terminates the compressing process (YES route at Step H8). 

[0178] The compressing apparatus 2 for an SGML document according to ttie fourth embodiment assgines a prede- 
45 termined initial code to each of tags extracted by the SGML tag extracting unit TOO to create a dictionary of the tags, 
and updates a code in the dictionary according to ttie frequency of occurrence of a tag in such a manner ttiat a tag more 
frequently occurring has a shorter code when ttie tag is coded. Accordingly, ttie compressing apparatus 2 re-assigns a 
shorter code to a tag more frequently occurring, as ttie coding of tags is proceeded. This largely improves a compress- 
ing rate of tags. 

50 

(d2) Description of a Decompressing Apparatus (Decoding Side) for an SGMLdocument 

[01 79] FIG. 1 9 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML 
document according to a fourth embodiment of this invention. A decompressing apparatus 3 shown in FIG. 19 con-e- 
ss spends to the decoding side of the compressing apparatus 2 described above with reference to FIGS, 17 and 18. 
According to this embodiment, the decompressing apparatus 3 has a dictionary creating unit 207 and a dictionary 
updating unit 208 as the tag decode table creating unit 201*. as compared with the decompressing apparatus 3 shown 
in FIG. 14. 
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[0180] The dictionary creating unit (first decoding dictionary creating unit) 208 assigns a predetermined initial code 
to each of tags extracted by the SGML tag extracting unit 200 so as to create a dictionary of the tags piret decoding 
dictionary) as a tag decode table. According to this embodiment the dictionary creating unit 208 assigns an initial code 
to each tag in the same rule as the fibove coding side. 

[0181 ] The dictionary updating unit (decoding dictionary updating unit) 207 updates (re-assigns) a code in the diction- 
ary created by the dictionary creating unit 207 when the tag is decoded by the decoding process unit 203a (tag decod- 
ing unit 203) according to the frequency of occurrence of the tag in such a manner that a code of a tag more frequently 
occurring has a shorter length. 

10182] Next, detailed description will be made of an operation of the decompressing apparatus 3 for an SGML docu- 
ment with the above structure according to the fourth embodiment with reference to a flowchart (Steps J 1 through J7) 
shown in FIG. 20. 

10183] The decompressing apparatus 3 scans the inputted DTD 302 to extract tags defined in the DTD 302 by the 
SGI^L tag extracfing unit 200 (Step J1). and outputs the tags to the dictionary creating unit 207 of the tag decode table 
creating unit 201'. The dictionary creating unit 207 successively assigns an initial code to each of the received tags in 
the same rule to assign initial codes as the coding side, so as to create the dictionary (tag decode table) (Step J2). 
[0184] The decompressing apparatus 3 determines whether inputted coded data is COC or not by the COC discrim- 
inating unit 202 (Step J3). If the inputted coded data is COC, the decompressing apparatus 3 directs the switching con- 
trol unit 205 of the decoding process unit 203a to switch output of the coded data to the tag decoding unit 203. The tag 
decoding unit 203 refers to the dictionary created by the dictionary creating unit 207 on the basis of coded data following 
the COC, and outputs a symbol (tag) con-esponding to the coded data as a result of the decoding (Step J4). 
[0185] The decompressing apparatus 3 calculates the frequency of occun'ence of the tag decoded by the tag decod- 
ing unit 203, and re-assigns a code according to a result of the calculation (a code shorter than the initial code) to the 
decoded tag to update the dictionary by the dictionary updating unit 208 (Step J5). 

[0186] If the coded data that is an object of the decoding is not COC at the above Step J3, the decompressing appa- 
ratus 3 directs the switching control unit 205 to switch tiie output of ttie coded data to the second decoding unit 204. 
The second decoding unit 204 decodes the coded data (character or character string) In a decoding system corre- 
sponding to the coding system on the coding side (from NO route at Step J3 to Step J6). 

[0187] The decompressing apparatus 3 determines whether the decoding is completed or not (Step J7). If the decod- 
ing is not completed (if some of the coded data still remain), tiie decompressing apparatus 3 repeats \he process from 
the above Step J3 until the decoding is completed (NO route at Step J7), ff the decoding is coirpleted. ttie decompress- 
ing apparatus 3 terminates the decompressing process (YES route at Step J7). 

[0188] The decompressing apparatus 3 for an SGML document according to tiie fourth embodiment assigns a pre- 
determined initial code to each of tags extracted by tiie SGML tag extracting unit 200 in the same rule as the coding 
side so as to create a dictionary of the tags, and updates a code in the first decoding dictionary according to the fre- 
quency of occurrence of a tag when the tag is decoded. As the decoding is proceeded, a shorter code is reassigned to 
a tag more frequently occurring. This can largely improve an efficiency of decoding of tags and enables accurate decod- 
ing of coded tags. 

(e) Description of a Fifth Embodiment 

(el) Description of a Compressing Apparatus (Coding Side) for an SGML document 

[0189] FIG. 21 is a block diagram showing a structure of an essential part of a compressing apparatus for an SGML 
document according to a fifth embodiment of this invention. As shown in FIG. 21, a compressing apparatus 2 for an 
SGML document according to the fifth embodiment has a code information oufputting unit 1 12 and a buffer 113 In addi- 
tion to a code creating unit 109 as the tag code table creating unit lOr, as compared witii the compressing apparatus 
2 shown in FIG. 17. 

[0190] The above code creating unit (second coding dictionary creating unit) 109 calculates the frequency of occur- 
rence of a tag in the document instance 303 on the basis of tags extracted by the SGML tag extracting unit 100, and 
assigns a code according to a result of the calculation to the tag so as to create a dictionary of tags (statistical quasi- 
dynamic dictionary: second coding dictionary) as a tag code table. The code information outputting unit (occun-ence fre- 
quency information outputting unit) 1 12 outputs information on the frequency of occurrence of tiie above tag to the tag 
decoding side. 

[0191] The buffer 113 holds document instance data until the tag code table (dictionary) is created by the code cre- 
ating unit 109, 

[0192] The above code creating unit 109 has, according to this embodiment, a tag counting unit 151. a tag holding 
unit 152, a tag determining unit 153. a code generating unit 154 and a code holding unit 155, as shown in FIG. 22, for 
example, to readily aeate the above statistical quasi-dynamic dictionary 
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[0193] The tag counting unit 151 determines whether a tag extracted by the SGML tag extracting unit 100 coincides 
with a tag in the document instance 303 ornot to count the frequency of occurrence of the teg in the document instance 
303. According to this embodiment, the tag extracted by the SQML tag extracting unit 100 and the tag in the document 
instance 303 determined as a tag by the tag determining unit 1 53 are held in the tag holding unit 1 52. and the frequency 
5 of occurrence of each tag is determined by counting the number of times of coincidence of each of held tegs. 

[0194] The code generating unit 154 generates a code according to a result of the counting by the tag counting unit 
151 as a code to be assigned to a tag. The code holding unit 155 relates the code generated by the code generating 
unit 154 to the teg held in the teg holding unit 152 fed through the tag determining unit 153 and holds them . so as to 
create a dictionary of tegs. 

10 [0195] The compressing apparatus 2 for an SGML document according to the fifth embodiment first creates a diction- 
ary (code table) of tegs in consideration of the frequencies of occun'ence of the tags in the document instance 303, and 
codes the tags on the basis of the dictionary (not updating the dictionary) in the following coding process. 
[0196] Next, deteiled description wilt be made of an operation of the compressing apparatus 2 for an SGMLdocument 
with the above structure according to the fifth embodiment with reference to a flowchart (Steps K1 through KB) shown 

15 in FIG. 23. 

[0197] The compressing apparatus 2 scans the inputted DTD 302 to extract tegs defined in the DTD 302 by the SGML 
tag extracting unit 100 (Step K1), and outputs the tags to the code aeating unit 109. 

[0198] The code creating unit 109 holds the received tags in the teg holding unit 152, while determining whether date 
in the inputted document instence 303 is a teg or not, therety holding only tags in the document instence date in the 

20 tag holding unit 152. The teg counting unit 151 counts the number of times of coincidence of each of the tegs held in 
the teg holding unit 152 to calculate the frequency of occurrence of each of the tags (Step K2). 
[0199] The code creating unit 109 generates a code according to the frequency of occunrence of a teg obteined as 
above by the code generating unit 154 and assigns each code to a corresponding teg. and holds the code (create a 
dictionary of tags) by the code holding unit 155 (Step K3). At this time, the occunrence frequency information on each 

25 of tags counted by the tag counting unit 151 is outputted to tiie decoding side through the code information outputting 
unit 11 2 as information used to create the same dictionary as the coding side by tiie decoding side. 
[0200] The compressing apparatus 2 determines whether the inputted document instance date is a tag or not by tiie 
SGML teg detecting unit 102 (Step K4). If tiie inputted document instance date is a tag, tiie compressing apparatus 2 
directs the COC outputting unit 106 to output COC, and direds the switching control unit 105 in the coding process unit 

30 1 03a to witch output of the document Instence date to ttie teg coding unit 1 03, at tiie same time. The COC outputting 
unit 1 06 oulpute the COC to tiie decoding side described later (from YES route at Step K4 to Step K5). The tag coding 
unit 1 03 refers to the dictionary created by the code creating unit 109 on tiie basis of tiie inputted date (tag), and outpute 
a code corresponding to the teg as a code of the teg (Step K6). 

[0201 ] If tiie document instance date tiiat is an object of tiie coding is not a teg at tiie above Step K4. tiie conpressing 
35 apparatus 2 directs the switching control unit 1 05 to switch the output of tiie document instence data to tiie second cod- 
ing unit 104. The second coding unit 104 codes the document instence date (character or character string) in a prede- 
termined coding system (from NO route at Step K4 to Step K7), 

[0202] The compressing apparatus 2 determines whether tiie coding is completed or not (Step K8). If the coding is 
not completed (If some of the document instence date still remain), the compressing apparatus 2 repeats the process 
40 from tiie above Step K4 until the coding is completed (NO route at Step KB). If the coding is completed, the compressing 
apparatus 2 terminates tiie compressing process (YES route at Step KB). 

[0203] The compressing apparatus 2 for an SGML document according to the fifth embodiment counts the frequency 
of occurrence of a teg in the document instence 303, assigns a code according to a result of the counting to tiie teg (i.e., 
assigns a shorter code to a tag more frequently occumng) to create a dictionary of tags (statistical quasi-dynamic dic- 

45 tionary). It is therefore possible to assign in advance a short code to a teg frequentiy occurring before the tag is coded. 
[0204] Accordingly, it is unnecessary to update the dictionary each time the teg is coded unlike tiie above statistical 
dynamic dictionary so that the compressing process can be sped up while a compression rate of tegs is improved. 
[0205] In the above compressing apparatus 2. tiie code information outputting unit 1 12 outputs information on tiie 
frequency of occurrence of a tag to the decoding side of the tag. Therefore, the decoding side can readily make tiie 

50 same dictionary as the coding side. This largely contributes to improvement of accuracy in the tag decoding process on 
the decoding side. Incidentally, it is possible to send not information on the frequency of occurrence of a teg, but infor- 
mation on a code table created on ttie coding side to the decoding side. 

(e2) Description of a Decompressing Apparatus (Decoding Side) for an SGMLdocument 

55 

[0206] FIG. 24 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML 
document according to a fifth embodiment of this invention. A decompressing apparatus 3 shown in FIG. 24 corre- 
sponds to ttie decoding side of the compressing apparatus 2 described above with reference to FIGS. 21 through 23. 
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According to this embodiment, the decompressing apparatus 3 has a buffer 213. in addition to a code creating unit 209 
as the tag decode table creating unit 20V instead of the memory 201 shown in FIG. 14. 

[0207] The above code creating unit (second decoding dictionary creating unit) 209 creates a dictionary (statistical 
quasi-dynamic dictionary: second decoding dictionary) of tags having the same code contents as the coding side as a 
tag decode table on the basis of ttie tags extracted by the SGML tag extracting unit 200 and the infbnnation on tiie fre- 
quency of occurrence of each tag sent through tiie code information outputting unit 1 12 on the coding side. 
[0208] The buffer 213 holds inputted coded data until ttie code creating unit 209 creates a tag decode table (diction- 
ary). 

[0209] Next, detailed description will be made of an operation of the decompressing apparatus 3 for an SGML docu- 
ment with tiie above structure according to the fifth embodiment vinth reference to a flowchart (Steps LI ttirough L6) 
shown in FIG. 25. 

[0210] The decompressing apparatus 3 scans the inputted DTD 302 to extract tags defined in the DTD 302 by ttie 
SGML tag extracting unit 200 (Step LI), and outputs the tags to ttie dictionary creating unit 209 of the tag decode table 
creating unit 201'. The dictionary creating unit 209 creates a decode table (dictionary) of tags having tfie same contents 
of codes as tiie code table created on ttie coding side on tine basis of ttie received tags and information on ttie fre- 
quency of occun-ence of each of the tags sent from the coding side (Step L2). 

[021 1 ] The decompressing apparatus 3 determines whett^er inputted coded data is COC or not by tiie COC deter- 
mining unit 202 (Step L3). H tiie inputted coded data is COC, the decompressing apparatus 3 directs tiie switching con- 
trol unit 205 of the decoding process unit 203a to switch output of tiie coded data to tiie tag decoding unit 203. The tag 
decoding unit 203 refers to the dictionary created by tiie dictionary creating unit 207 on ttie basis of coded data following 
the COC, and outputs a symbol (tag) corresponding to tiie coded data as a result of ttie decoding (from YES route at 
Step L3 to Step L4). 

[0212] If the coded data ttiat is an object of the decoding is not COC, ttie decompressing apparatus 3 directs the 
switching control unit 205 to switch the output of tiie coded data to tiie second decoding unit 204. The second decoding 
unit 204 decodes the coded data (character or character string) in a decoding system corresponding to the coding sys- 
tem on the coding side (from NO route at Step L3 to Step L5). 

[021 3] The decompressing apparatus 3 determines whether tiie decoding is completed or not (Step L6). If ttie decod- 
ing is not completed (if some of the coded data still remain), ttie decompressing apparatus 3 repeats ttie process from 
the above Step L3 until ttie decoding is completed (NO route at Step L6). If ttie decoding is completed, ttie decompress- 
ing apparatus 3 terminates the decompressing process (YES route at Step L6). 

[0214] The decompressing apparatus 3 for an SGML document according to ttie fiftti embodiment creates a tag 
decode table having the same contents of codes as tiie coding side on tiie basis of tags in tiie DTD 302 extracted by 
the SGML tag exti-acting unit 200 and information on ttie frequency of occurrence of each of ttie tags in the document 
instance 303 of the SGML document sent form ttie coding side. It is ttierefore possible to accurately decode tags coded 
on the coding side. Since a shorter code is assigned to a tag more frequently occurring in advance as same as tiie cod- 
ing side, it is possible to iirprove an efficiency of tag decoding and speed up ttie decoding process. 

(f) Description of a Sixtti Embodiment 

(f1) Description of a Compressing Apparatus (Coding Side) for an SGML Document 

[021 5] FIG. 26 is a block diagram showing a sti-ucture of an essential part of a compressing apparatus for an SGML 
document according to a sixtii embodiment of tills invention. A compressing apparatus 2 shown in FIG. 26 has an 
SGML tag detecting unit 102' instead of the SGML tag delecting unit 102 shown in FIG. 10, which includes a start-tag 
holding unit 11 0 and a start-tag detecting unit 111. 

[021 6] The above start-tag holding unit 1 1 0 holds only a tag start character (string) (" (" or " (T, for example) showing 
a start of a tag in the DTD 302 extracted by the SGML tag extracting unit 100. The start-tag detecting unit 1 1 1 detects 
whether data of inputted document instance 303 is a start-tag or not on the basis of ttie tag start characters (strings) 
(hereinafter referred as start-tags) held in ttie start-tag holding unit 110. 

[0217] The SGML tag detecting unit (tag discriminating unit) 102' according to this embodiment detects a start-tag 
showing a start of a tag on the basis of the tags extracted by ttie SGML tag exti-acting unit 100, tiiereby determining that 
input data is a tag. 

[0218] According to ttiis embodiment, when the above start-tag is detected, the above start-tag detecting unit 111 
gives a direction to ttie switching control unit 205 so that ttie start-tag itself ("<" or "(r) is coded as data other than tags 
by the second coding unit 104, after that, gives a direction to the switching control unit 205 so that data following ttie 
above start-tag is coded as a body of a tag by the tag coding unit 103. 

[021 9] Next, detailed description will be made of an operation of tiie compressing apparatus 2 for an SGML document 
with the above structure according to ttie sixtti embodiment with reference to a flowchart (Steps Ml ttirough M6) shown 
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in FIQ. 27. 

[0220] The compressing apparatus 2 scans the inputted DTD 302 to extract tags defined in the DTD 302 by the SGML 
tag extracting unit 100. successively stores the tags in the memory 101. and assigns address Information and length 
information of the memory 101 to each of the tags to create a tag code table (Step Ml ). 
5 [0221 ] At this time, only start-tags among the tags extracted by the SGML tag extracting unit 1 00 are outputted to the 
start-tag holding unit 110. The start-tag holding unit 1 10 successively holds the inputted start-tags to decide the start' 
tags (Step M2). 

[0222] The compressing apparatus 2 determines whether inputted document instance data is a start-tag or not by the 
start-tag detecting unit 1 1 1 (Step M3). If the inputted document instance data is a start-tag. the compressing apparatus 
10 2 directs the switching control unit 105 of the coding process unit 103a to switch output of the document instance data 
to the second coding unit 104. The second coding unit 104 codes the input data (start-tag) in a predetermined coding 
system. 

[0223] After that, the start-tag detecting unit 1 1 1 directs the switching control unit 1 05 to switch the output of the doc- 
ument instance data to the tag coding unit 103. whereby a body of a tag following the above start^g is inputted to the 

15 tag coding unit 103. The tag coding unit 103 refers to the memory 101 on the basis of the inputted data (body of the 
tag), and outputs an address and a length of the tag as a code of the tag (from YES route at Step M3 to Step M4). 
[0224] If the inputted document instance data is not a start-tag, the start-tag detecting unit 1 1 1 directs the switching 
control unit 105 to mitch the output of the document instance data to the second coding unit 104. The second coding 
unit 104 codes the document instance data (character or character string) in a predetermined coding system (from NO 

20 route at Step M3 to Step M5). 

[0225] The conpressing apparatus 2 determines whether the coding is completed or not (Step M6). If the coding is 
not completed (if some of the document instance data still remain), the compressing apparatus 2 repeats the process 
from the above Step M3 until the coding is completed (NO route at Step M6). H the coding is completed, the compress- 
ing apparatus 2 terminates the compressing process (YES route at Step M6). 

25 [0226] The compressing apparatus 2 for an SGML document according to the sixth embodiment determines whether 
the inputted document instance data is a tag or not by detecting a start-tag. It is therefore possible to determine a tag 
from a start-tag on the decoding side in the similar manner even if the above COG is not outputted to the decoding side. 
Since the COG is not outputted, it is possible to more increase a compression rate of SGML documents. 
[0227] Since determination of a tag is done by detecting only a start-tag, it is possible to determine a tag with a simpler 

30 Structure and at a high speed. This largely contributes to speeding-up of the tag compressing process. 

(f2) Description of a Decompressing Apparatus (Decoding Side) for an SGML Document 

[0228] FIG. 28 is a block diagram showing a structure of an essential part of a decompressing apparatus for an SGML 

35 document according to the sixth emtxxliment of this invention. A decompressing apparatus 3 shown in FIG. 28 con-e- 
sponds to the decoding side of the compressing apparatus 2 described above with reference to FIGS. 26 and 27, 
According to this embodiment the decompressing apparatus 3 has an SGML tag detecting unit 202' including a start- 
tag holding unit 210 and a start-tag detecting unit 21 1 , instead of the COG discriminating unit 202 shown in FIG. 14. 
[0229] The above start-tag holding unit 210 and the start-tag detecting unit 21 1 are similar to the start-tag holding unit 

40 110 and the start-tag detecting unit 1 1 1 on the coding side, respectively. The start-tag holding unit 210 holds only start- 
tags (T. "(/", and the like) in the DTD 302 extracted by the SGML tag extracting unit 200. The start-tag detecting unit 
211 detects whether a symbol decoded by the second decoding unit 204 is a start-tag or not on the basis of the start- 
tags held in the start-tag holding unit 210. When a start-tag is detected, the start-tag detecting unit 211 directs the 
switching control unit 205 to switch output of coded data to the tag decoding unit 203 since the coded data following the 

45 Start-tag, which is an Object Of the decoding, is a code of the tag. 

[0230] Next, detailed description will be made of an operation of the decompressing apparatus 3 with the above struc- 
ture according to the sixth embodiment with reference to a flowchart (Steps N1 through N6) shown in FIG. 29. 
[0231] The decompressing apparatus 3 scans the inputted DTD 302 to ^act tags defined in the DTD 302 by the 
SGML tag extracting unit 200. successively stores the extracted tags in the memory 101 . and assigns address informa- 

50 tion and length information of the memory 1 01 to each of the tags as a code of a tag to create a tag decode table (Step 
N1). 

[0232] At this time, only start-tags among the tags extracted by the SGML tag extracting unit 200 are outputted to the 
start-tag holding unit 210. The start-tag holding unit 210 successively holds the inputted start-tags to determine the 
start-tags (Step N2). 

55 [0233] The decompressing apparatus 3 determines whether a symbol decoded by the second decoding unit 204 is a 
start-tag or not by the start-tag detecting unit 21 1 (Step N3). If the symbol is a start-tag, the decompressing apparatus 
3 directs the switching control unit 205 to switch output of coded data (code of body of a tag = address and length) input- 
ted following the start-tag to the tag decoding unit 203 so that the coded data is outputted to the tag decoding unit 203. 
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10234] The tag decoding unit 203 refers to the memory 201 on the basis of the inputted data (address and length), 
and outputs a con-espondlng tag as a result of the decocBng (from YES route at Step N3 to Step N4). 
[0235] if the symbol decoded by the second decoding unit 204 is not a start-tag, the start-tag detecting unit 21 1 
directs the switching control unit 205 to switch the output of the coded data to the second decoding unit 204. The sec- 
5 ond decoding unit 204 decodes the coded data In a decoding system conresponding to the coding system on the coding 
side (from NO route at Step N3 to Step 

[0236] The decompressing apparatus 3, determines whether the decoding is completed or not (Step N6), If the 
decoding is not completed (if some of the coded data still remain), the decompressing apparatus 3 repeats the process 
from the above Step N3 until the decoding is completed (NO route at Step N6). If the decoding is conpleted, the decom- 

10 pressing apparatus 3 terminates the decompressing process (YES route at Step N6). 

[0237] The decompressing apparatus 3 for an SGML document according to the sixth embodiment detects whether 
decoded coded data is a start-tag or not to determine a start position of a tag, so as to switch between decoding of a 
tag and decoding of a character (string) other than tags without receiving the above COC. It is therefore possible to 
accurately decompress tags while more increasing a compression rale on the coding side since no CXDC is received. 

15 [0238] Since determination of a tag is done by detecting only a start-tag. it is possible to determine a tag with a simpler 
structure and at a higher speed. This largely contributes to speeding-up of the tag decoding process. 
[0239] The compressing apparatus 2 for an SGML document in each of the above embodiments can code tags in the 
document instance 303 and compress them so as to largely decrease a quantity of data of an SGML document. Since 
not only tags but also characters (sti-ings) other than tags are coded in a predetermined coding system to be com- 

20 pressed, a quantity of data of an SGML document can be largely decreased. 

[0240] The deconpressing apparatus 3 for an SGML document in each of the above embodiments can decode coded 
tags or tags and characters (strings) otfier than tags, efficiently and accurately. It is therefore possible to accurately 
decompress tags ortags and characters (strings) other than tags at any time. 

[0241] Each of the above comrpessing apparatus 2 and decompressing apparatus 3 can be accomplished by provid- 
es Ing the recording medium 15 such as the floppy disk 1 1, the CD-ROM 12, ttie M0 13 or ttie like storing a compression 
program and a decompression program having the above functions to the computers 2 and 3. This largely improves ver- 
satility of tills invention so that spread of this invention is largely expected. 

(g) Ottiers 

30 

[0242] In each of tiie above embodiments, ttie compressing apparatus 2 and the decompressing apparatus 3 are real- 
ized as different units in different personal computers. However, it is alternatively possible to realized both of the com- 
pressing apparatus 2 and the decompressing apparatus 3 as a compressingWeconpressing apparatus in one personal 

computer. 

35 [0243] When the compressing apparatus 2 (refer to FIG. 10) and the decompressing apparatus (refer to FIG. 14) 
described before in the third embodiment are realized in one personal computer, a stmcture of which is as shown in 
FIG. 30, 

[0244] In this case, the decoding side may use a tag code table created on tiie coding side to decode tags. Therefore, 
the memory 101 is commonly used on both of the coding and decoding sides (functioning as a tag code^decode table 
40 creating unit), as shown in FIG. 30. An operation of each part of tiie compressing/decompressing apparatus for an 
SGML document shown in FIG. 30 is similar to that described in tiie third embodiment, detailed description of which is 
omitted here. 

[0245] When tiie tags are decoded, tiie above compressing/decompressing apparatus for an SGML document 
decodes tags on the basis of stored contents (tag code/decode table) of the memory 101 created and used when the 
45 tags are coded. Accordingly, It is at least unnecessary to separately create a decode table for decoding tags and a code 
table for coding the tags unlike each of the above embodiment. This largely contributes to speeding-up of the tag decod- 
ing (decompressing) process and a decrease of a scale of tiie apparatus. 

[0246] As to the compressing apparatus 2 and ttie decompressing apparatus 3 in each of the above embodiments 
excepting the third embodiment, it is possible to realize tfiem as a compressing/decompressing apparatus in one appa- 
50 ratus (personal computer), as well. 

[0247] In each of the above embodiments, a tag defined in the DTD 302 of an SGML document is. extracted and 
assigned a code thereto. However, If a tag Is also defined in the SGML declaration 301 as well as tiie DTD 302, the tag 
in the SGML declaration 301 may be extracted and assigned a code thereto. 

[0248] Furttier, in each of the above embodiments, only the document instance 303 of an SGML document is com- 
55 pressed/decompressed. However, it is possible to compress/decompress portions' (SGML declaration 301 and DTD 
302) other than tiie document Instance 303. 
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Claims 

1 . A tag document compressing apparatus (2) for coding a tag document, having a document type definition defining 
tags showing a document structure and a document instance described using the tags defined in said document 

5 type definition, to compress said tag document, the apparatus comprising: 

a tag extracting unit (30) for scanning the document definition of an Inputted tag document to extract the tags; 
a tag code table creating unit (40) Ibr assigning a predetermined code to each tag in said document definition 
on the basis of the tags extracted by said tag extracting unit (30), to create a tag code table; and 
10 a tag coding unit (60) for coding the tags in said document instance on the basis of said tag code table created 

by said tag code table creating unit (40). 

2. Tne tag document compressing apparatus according to claim 1 , wherein when a plurality of tag documents having 
the same document type definition are coded, said tag coding unit (60) codes tags in the document instances of all 

15 of the tag documents on the basis of a tag code table created with respect to the first tag document by said tag 
extracting unit (30) and said tag code table creating unit (40). 

3. A tag document compressing apparatus (2) for coding a tag document having a document type definition defining 
tags showing a document structure and a document instance described using the tags defined in said document 

20 type definition to compress said tag document, said apparatus comprising: 

a teig extracting unit (1 00) fbr scanning the document type definition of an inputted tag document to extract the 
tags: 

a tag code creating unit for assigning a predetermined code to each tag in said document type definition on the 
25 basis of the tags extracted by said tag extracting unit (100). to create a tag code table; 

a tag discriminating unit (102) for determining whether data in said inputted document instance is a tag 
extracted by said tag extracting unit; 

a coding process unit (103a) fbr coding said inputted data on the basis of said tag code table when sard tag 
discriminating unit (102) determines that said inputted data is a tag, and coding said inputted data in a prede- 
30 termined coding system when said tag discriminating unit (1 02) determines that said inputted data is not a tag; 

and 

a special code outputting unit (1 06) fbr outputting a special code showing coding of a tag to a decoding side of 
said tag before said inputted data Is coded when said tag discriminating unit (102) discriminates that said Input- 
ted data is a tag. 

35 

4. The tag document compressing apparatus according to claim 3. wherein said coding process unit (103a) com- 
prises: 

a first coding unit (103) for coding said inputted data on the basis of said tag code table; 
40 a second coding unit (104) for coding said inputted data in a predetermined coding system; and 

a switching control unit (105) fbr outputting said inputted data to said first coding unit (103) when said tag dis- 
criminating unit (102) determines that said inputted data is a tag, and outputting said inputted data to said sec- 
ond coding unit (104) when said tag discriminating unit (102) determines that said inputted data is not a tag. 

45 5. The tag document compressing apparatus of claim 3 or 4, wherein said tag code table creating unit has a tag stor- 
ing unit (101) fbr storing the tags extracted by said tag extracting unit (100), and assigns information on a position 
in which each tag is stored in said tag storing unit (101) as a code of said tag to create said tag code table. 

6. The tag document compressing apparatus according to claim 5, wherein said information on a storing position is 
50 information including address information of said tag storing unit (101). 

7. The tag document compressing apparatus according to claim 6, wherein said information on a storing position is 
said address information and infbrmation on a lengtii of a relevant tag. 

55 8. The tag document compressing apparatus of any of claims 3 to 7. wherein said tag code table creating unit com- 
prises: 

a first coding dictionary creating unit (107) for assigning a predetermined initial code to each tag extracted by 
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said tag extracting unit (100) to aeate a first coding dictionary of the tags as said tag code table; and 
a coding dictionary updating unit (1 08) for updating said code In said first coding dictionary created by said first 
coding dictionary creating unit (107) according to the frequency of occurrence of a corresponding tag when 
said coding process unit (103a) codes said tag. 

9. The tag document compressing apparatus of any of claims 3 to 8, wherein said tag code table creating unit com- 
prises: 

a second coding dictionary creating unit (1 09) for counting the frequency of occurrence of each tag in said doc- 
ument instance on the basis of the tags extracted by said tag ©dracling unit (100), and assigning a code 
according to a result of the counting to each tag to create a second coding dictionary of said tag as said tag 
code tabie. 

10. The tag document compressing apparatus according to claim 9 further comprising an occun-ence frequency infor- 
mation outputting unit (1 12) for outputting information on the frequency of occun-ence of each tag to said decoding 
side of said tag. 

11. The tag document compressing apparatus according to daim 9, wherein said second coding dictionary creating 
unit (109) comprises: 

a tag counting unit (1 51) for determining whether each tag extracted by said tag extracting unit (1 00) coincides 
with other tags in said document instance to count the frequency of occurrence of said tag in said document 

instance; 

a code generating unit (154) for generating a code according to a result of the counting by said tag counting 
unit (151); and ^ 

a code holding unit (1 55) for holding said code generated by said code generating unit (1 54) to create said sec- 
ond coding dictionary. 

12. A tag document compressing apparatus (2) for coding a tag document, having a document type definition defining 
tags showing a document structure and a document instance described using said tag defined in said document 
type definition, to compress said tag document, the apparatus comprising: 

a tag extracting unit (100) for scanning said document type definition of an inputted tag document to extract the 

tags; 

a tag code table creating unit for assigning a predetermined code to each tag in said document type definition 
on the basis of the tags extracted by said tag extracting unit (100), to create a tag code table; 
a tag discriminating unit (102) for determining whether inputted data in said document instance is a tag 
extracted by said tag extracting unit (100); and 

a coding process unit (103a) for coding said inputted data on the basis of said tag code table when said tag 
discriminating unit (102 ) detemfiines that said inputted data is a tag, and coding said inputted data in a prede- 
termined coding system when said tag discriminating unit (1 02^ determines that said irputted data Is not a tag; 

13. The tag document compressing apparatus according to claim 12. wherein said tag discriminating unit (102*) detects 
a start-tag showing a start of a tag oh the basis of the tags extracted by said tag extracting unit (100) to determine 
that said inputted data is a tag. 

14. A tag document decompressing apparatus (3) for decoding a coded tag document, having a document type defini- 
tion defining a tag showing a document structure and a document instance described using said tag defined in said 
document type definition, to decompress said coded tag document, the apparatus comprising: 

a tag extracting unit (30') for scanning said document type definition of an inputted tag document to extract the 
tags; 

a tag decode table creating unit (40') for assigning a predetermined code to each tag in said document type 
definition on the basis of the tags extracted by said tag extracting unit (30') to create a tag decode table; and 
a tag decoding unit (60') for decoding the tags in said coded document instance on the basis of said tag decode 
table created by said tag decode table creating unit (40). 

15. The tag document decompressing apparatus according to claim 14. wherein when a plurality of tag documents 
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haying the same document type definition are decoded, said tag decoding unit (60^ decodes tags in document 
instances of all of s«dd tag documents on the t^!s of said tag decode table o'^ted with respect to the first tag doc- 
ument by said tag extracting unit (30*) and said tag decode table creating unit (40*). 

5 1 6. A tag document decompressing apparatus (3) for decoding a coded tag document, having a document type defini- 
tion defining a tag showing a document structure and a document instance described using said tag defined in said 
document type definition, to decompress said coded tag document, said apparatus comprising: 

a tag extracting unit (200) for scanning the document definition of an inputted tag document to extract the tags; 
10 a tag decode table creating unit for assigning a predetermined code to each tag in said document type defini- 

tion on the basis of the tags extracted by said tag extracting unit (200) to create a tag decode table; 
a special code discriminating unit (202) for determining whether inputted coded data is a special code showing 
inputting of coded data of a tag; and 

a decoding process unit (203a) for decoding coded data following said special code on the basis of said tag 
IS decode table when said special code discriminating unit (202) determines that said coded data is said special 

code, and decoding said coded data in a predetermined decoding system when said special code discriminat- 
ing unit (202) determines that said coded data Is not said special code. 

17. The tag document decompressing apparatus according to claim 16, wherein said decoding process unit (203a) 
20 conprises: 

a first decoding unit (203) for decoding said inputted coded data on the basis of said tag decode table; 
a second decoding unit (204) for decoding said inputted coded data in a predetermined decoding system; and 
a switching control unit (20^ for outputting coded data following said spedal code to said first decoding unit 
25 (203) when said special code discriminating unit (202) determines that said coded data is said special code, 

and outputting said coded data to said second decoding unit (204) when said special code discriminating unit 
(202) determines that said coded data is not said special code. 

18. The tag document decompressing apparatus of claim 16 or 17, wherein said tag decode table creating unit has a 
30 tag storing unit (201) for storing the tags extracted by said tag extracting unit (200), and assigns infomiation on a 

position in which each tag Is stored in said tag storing unit (201) as a code of said tag to create said tag decode 
table. 

19. The tag document decompressing apparatus according to claim 18, wherein said information on a storing position 
35 is information including address information of said tag storing unit (201). 

20. The tag document decompressing apparatus according to claim 19, wherein said information on a storing position 
Is said address information and Infomiation oh a length of a relevant tag. 

40 21. The tag document decompressing apparatus according to claim 16, wherein said tag decode table creating unit 
comprises: 

a first decoding dictionary creating unit (207) for assigning a predetermined initial code to a tag extracted by 
said tag extracting unit (200) to create a first decoding dictionary of said tags as said tag decode table; and 
45. a decoding dictionary updating unit (208) for updating said code in said first decoding dictionary created by 

said first decoding dictionary creating unit (207) according to the frequency of occun'ence of a corresponding 
tag when said decoding process unit (203a) decodes said tag. 

22. The tag document decompressing apparatus of any of claims 16 to 21 , wherein said tag decode table creating unit 
50 comprises: 

a second decoding dictionary creating unit (209) for creating a second decoding dictionary of said tags on the 
basis of the tags extracted by said tag extracting unit (200) and information on the frequency of occurrence of 
the tags. 

55 

23. A tag document decompressing apparatus (3) for decoding a coded tag document, having a document type defini- 
tion defining a tag showing a document structure and a document instance described using said tag defined in said 
document type definition, to decompress said coded tag document, said apparatus comprising: 
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a tag extracting unit (200) for scanning the document type definition of an inputted tag document to extract the 
tags; 

a tag decode table aeating unit for assigning a predetermined code to the tags In said document type definition 
on the basis of the tags extracted by said tag extracting unit (200) to create a tag decode table; 
a tag code discriminating unit (202) for determining whether inputted coded data is coded data of a tag; and 
a decoding process unit (203a) for decoding said coded data on the basis of said tag decode table when said 
tag code discriminating unit (202") determines that said coded data is a tag, and decoding said coded data in 
a predetermined decoding system when said code discriminating unit (202") determines that said coded data 
is not a tao. 



is not a tag. 

24. The tag document decompressing apparatus according to claim 23. wherein said tag code discriminating unit 
(2030 detects a start-tag showing a start of a tag on the basis of said tag extracted by said tag extracting unit (200) 
to determine that said coded data is a tag. 

25. A tag document compressing/decompressing apparatus for coding a tag document having a document type defini- 
tion defining tags showing a document structure and a document instance described using said tag defined in said 
document type definition to compress said tag document, and decoding said coded tag document to decompress 
the same, the apparatus comprising: a tag extracting unit (100) for scanning said document type definition of an 
inputted tag document to extract the tags; 

a tag codeWecode table creating unit for assigning a predetermined code to each tag in said document type 
definition on the basis of the tags extracted by said tag extracting unit (1 00) to create a tag code/decode table; 
a tag coding unit (1 03) for coding the tags in said document instance on the basis of said tag code/decode table 
created by said tag code/decode table aeating unit; and 

a tag decoding unit (203) for decoding the tags in said document instance coded by said tag coding unit on the 
basis of said tag code/decode table created by said tag code/decode table creating unit. 

26. A tag document compressing/deconpressing apparatus for coding a tag document having a document type defini- 
tion defining tags showing a document structure and a document Instance described using the tags defined in said 
document type definition to compress said tag document, and decoding said coded tag document to decompress ; 
the same, comprising: 

a tag extracting unrt (100) for scanning said document type definition off an inputted tag document to extract the 

tags; 

a tag code/decode table creating unit for assigning a predetermined code to each tag in said document type 
definition on the basis of said tag extracted by said tag extracting unit (100) to aeate a tag code/decode table; 
a tag discriminating unit (102) for determining whether inputted data in said document instance is a tag 
extracted by said tag extracting unit; 

a coding process unit (103a) for coding said inputted data based on said tag code/decode table when said tag 
discriminating unit (102) determines that said inputted data is a tag, and coding said inputted data in a prede- 
termined coding system when said tag discriminating unit (102) determines that said inputted data is not a tag; 
a special code outputting unit (106) for outputting a special code showing coding of a tag before said inputted 
data is coded when said tag discriminating unit (102) determines that said inputted data Is a tag; 
a special code discriminating unit (202) for detennining whether coded data outputted from said coding proc- 
ess unit (1 03a) is said special code; and 

a decoding process unit (203a) for decoding coded data following said special code outputted from said coding 
process unit (1 03a) on the basis of said tag code/decode table when said special code discriminating unit (202) 
determines that said coded data is said special code, and decoding said coded data outputted from said cod- 
ing process unit (103a) in a predetermined decoding system when said special code discriminating unit (202) 
determines that said coded data is not said special code. 

27. A tag document compressing method for coding a tag document having a document type definition defining tags 
showing a document structure and a document instance described using the tags defined in the document type 
definition to compress said tag document, the method comprising the steps of: 

assigning a predetermined code to each tag in said document type definition to create a tag code table, and 
decoding the tags in said document instance on the basis of said tag code table. 
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28. The tag document compressing method according to claim 27. wherein when a plurality of tag documents having 
the same document type definition are coded, tags in the document instances of all of said tag documents are 
coded on the basis of said tag code table created with respect to the first tag document. 

5 29. A tag document compressing method fbr coding a tag document having a document type definition defining tags 
showing a document structure and a document instance described using the tags defined in said document type 
definition to compress said tag document, the method comprising the steps of: 

assigning a predetermined code to each tag in said document type definition to create a tag code table; 
10 outputting a special code showing coding of a tag to a decoding side o1 said tag when inputted data of said doc- 

ument instance is a tag and coding said inputted data on the basis of said tag code table, and coding said 
inputted data in a predetermined coding system when said Inputted data is not a tag. 

30. A tag document conpressing method fbr coding a tag document having a document type definition defining tags 
15 showing a document structure and a document instance described using said tag defined in said document type 

definition to compress said tag document, the method comprising the steps of: 

assigning a predetermined code to each tag to create a tag code table; 

coding Inputted data In said document Instance on the basis of said tag code table when said inputted data is 
20 a tag, and coding said inputted data in a predetermined coding system when said inputted data is not a tag. 

31. A tag document decompressing method for decoding a coded tag document having a document type d^inition 
defining tags showing a document structure and a document instance described using the tags defined in said doc- 
ument type definition to decompress said coded tag document, the method comprising the steps of: 

25 

assigning a predetermined code to each tag in said document type def inition to create a tag decode table; and 
decoding the tags In said coded document instance on the basis of said tag decode table. 

32. The tag document decompressing method according to claim 31 . wherein when a plurality of tag documents having 
30 the same document type definition are decoded, tags in the document Instances of all of said tag documents are 

decoded on the basis of a tag decode table created with respect to ttie first tag document. . 

33. A tag document decompressing method fbr decoding a coded tag document having a document type definition 
defining tags showing a document structure and a document instance described using the tags defined in said doc- 

35 ument type definition to decompress said coded tag document, the method comprising the steps of: 

assigning a predetermined code to each tag in said document type definition to create a tag decode table; and 
decoding coded data inputted following a special code showing that coded data is Inputted on the basis of said 
tag decode table when said inputted coded data is said speciial code, and decoding said coded data in a pre- 
40 determined decoding system when said inputted coded data is not said special code. 

34. A tag document decompressing method for decoding a coded tag document having a document type definition 
defining tags showing a document structure and a document instance described using the tags defined in said doc- 
ument type definition to decompress said coded tag document, ttie method comprising ttie steps of: 

45 

assigning a predetermined code to ttie tags in said document type definition to create a tag decode table; and 
decoding inputted coded data on tiie basis of said tag decode table when said inputted coded data is coded 
data of a tag, and decoding said inputted coded data in a predetermine decoding system when said inputted 
coded data Is not coded data of a tag. 

50 

35. A tag document compressing/decompressing method for coding a tag document having a document type definition 
defining tags showing a document structure and a document instance described using the tags defined In said doc- 
ument type definition to compress said tag document, and decoding said coded tag document to decompress tiie 
same, wherein the method comprises the steps of: 

55 

assigning a predetermined code to each tag in said document type definition to create a tag code/decode 
table; and 

coding the tags in said document instance on tiie basis of said tag code/decode table, and decoding said 
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coded tags on the basis of said tag code/decode table. 

36. A tag document compressing/decompressing method for coding a tag document having a document type definition 
defining a tag showing a document structure and a document instance described using said tag defined in said 

5 document type definition to compress said tag document, and decoding said coded tag document to decompress 
the same, wherein the method conprises the steps of: 

assigning a predetermined code to each tag in said document type definition to create a tag code/deoode 
tabie; 

10 outputting a special code showing coding of a tag when inputted data in said document instance is a tag and 

coding said inputted data on the basis of said tag code/decode table, and coding said inputted data in a pre- 
determined coding system when said inputted data is not a tag; and 

when coded data is decoded, decoding coded data following said special code on the basis of said tag 
code/decode table when said coded data is said special code, and decoding said coded data in a predeter- 
is mined decoding system when said coded data is not said special code. 

37. A recording medium (15) readable by a computer, storing a tag document compressing program for coding a tag 
document having a document type definition defining tags showing a document structure and a document instance 
described using the tags defined in said document type definition to compress said tag document, characterised in 
that said tag document compressing program makes said computer (26) function as a tag extracting unit (30) for 
scanning said document type definition of an inputted tag document to extract the tags, a tag code table creating 
unit (40) for assigning a predetermined code to each tag on the basis of the tags extracted by said tag extracting 
unit (30) to create a tag code table, and a tag coding unit (60) for coding the tags in said document instance on the 
basis of said tag code table created by said tag code table creating unit (40). 
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38. A recording medium (15) readable by a conputer. storing a tag document compressing program for coding .a tag 
document having a document type definition defining tags showing a document structure and a document instance 
described using the tags defined in said document type definition to compress said tag document, characterised in 
that said tag document compressing program makes said computer (26) function as a tag extracting unit (100) for 

30 scanning said document type definition of an inputted tag document to extract the tags, a tag code table creating 
unit (1 0 1 ) for assigning a predetermined code to each tag in said document type definition on the basis of theltegs 
extracted by said tag extracting unit (100) to create a tag code table, a tag discriminating unit (102) for determining 
whether inputted data In said document instance is a tag extracted by said tag extracting unit (100). a coding^proc- 
ess unit (103a) for coding said inputted data on the basis of said tag code table when said tag discriminatic^;unit 

35 (1 02) determines that said inputted data is a tag. and coding said inputted data in a predetermined coding system 
Myhen said tag discriminating unit (102) determines that said inputted data is not a tag. and a special code output- 
ting unit (106) for outputting a special code showing coding of a tag to a decoding side of said tag before said input- 
ted data is coded when said tag discriminating unit (102) determines that said inputted data is a tag. 

40 39. A recording medium (15) readable by a computer, storing a tag document decompressing program for decoding a 
coded tag document having a document type definition defining tags showing a document structure and a docu- 
ment instance described using the tags defined in said document type definition to decompress said coded tag 
document, characterised in that said tag document decompressing program makes said computer (26) function as 
a tag extracting unit (30') for scanning said document type definition of an inputted tag document to extract the tags, 

45 a tag decode table creating unit (400 for assigning a predetermined code to each tag in said document type defini- 
tion on the basis of the tags extracted by said tag extracting unit (30*) to create a tag decode table, and a tag decod- 
ing unit (60*) for decoding the tags in said coded document instance on the basis of said tag decode table created 
by said tag decode table creating unit (40'). 

50 40. A recording medium (15) readable by a computer, storing a tag document decompressing program for decoding a 
coded tag document having a document type definition defining tags showing a document structure and a docu- 
ment instance described using the tags defined In said document type definition to decompress said tag document, 
characterised in that said tag document decompressing program makes said computer (26) function as a tag 
extracting unit (200) for scanning said document type definition of an inputted tag document to extract the tags, a 

55 tag decode table creating unit (201) for assigning a predetermined code to each tag in said document type defini- 
tion on the basis of the tags extracted by said tag extracting unit (200). a special code discriminating unit (202) for 
determining whether inputted coded data is a special code showing that coded data of a tag is inputted, and a 
decoding process unit (203a) for decoding coded data Inputted following said special code on the basis of said tag 
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decode table when said special code discriminating unit (202) determines that said coded data is said special code, 
and decoding said coded data in a predetermined decoding system when said special code discriminating unit 
(202) determines that said coded data is not said special code. 

5 41. A recording medium (15) readat)le by a computer, storing a tag document compressing/decompressing program 
for coding a tag document having a document type definition defining tags showing a document structure and a 
document instance described using the tags defined in said document type definition to compress said tag docu- 
ment and decoding said coded tag document to decornpress the same, characterized in that said tag document 
compressing/decompressing program makes said computer (26) function as a tag extracting unit (100) for scan- 

10 ning said document type definition of an inputted tag docunient to extract the tags, a tag code/decode table creating 
unit (101) for assigning a predetermined code to each tag on the basis of the tags extracted by said tag extracting 
unit (100) to create a tag code/decode table, a tag coding unit (103) for coding the tags in said document instance 
cn the basis of said tag code/decode table aeated by said tag code/decode table creating unit (101). and a tag 
decoding unit (203) for decoding the tags in said document instance coded by said tag coding unit (103) on the 

IS basis of said tag code/decode table created by said tag code/decode table creating unit (101). 

42. A recording medium (15) readable by a conputer, storing a tag document compressing/decompressing program 
for coding a tag document having a document type definition defining tags showing a document structure and a 
document instance described using the tags defined in said document type definition to compress said tag docu- 

20 ment and decoding said coded tag document to decompress the same, characterized in that said tag document 
compressing/decompressing program makes said computer (26) function as a tag extracting unit (100) fa scan- 
ning said document type definition of an inputted tag document to extract the tags, a tag code/decode table aeating 
unit (101) for assigning a predetermined code to each tag in said document type definition on the basis of the tags 
extracted by said tag extracting unit (100) to create a tag code/decode table, a tag discriminating unit (102) for 

25 determining whether inputted data in said document instance is a tag extracted by said tag extracting unit (1 00), a 
coding process unit (103a) for coding said inputted data on the basis of sakJ tag code^deoode table when said tag 
discriminating unit (102) determines that said inputted data is a tag, and coding said inputted data in a predeter- 
mined system when saW tag discriminating unit (102) determines that said inputted data is not a tag, a special code 
oulputting unit (106) for outputting a special code showing coding of a tag before said inputted data is coded when 

30 said tag discriminating unit (102) determines that said inputted data is a tag. and a decoding process unit (203a) 
for decoding coded data following said special code outputted from said coding process unit (103a) on the basis of 
said tag code/decode table when said special code discriminating unit (202) determines that said coded data is 
said special code, and decoding said coded data in a predetermined decoding system when said special code dis- 
criminating unit (202) determines that said coded data is not said special code: 

35 



40 



45 



SO 



55 



30 



EP0 896284A1 



FIG. I 



15: RECORDING 
MEDIUM 






11 . . . ^12 13 
FD CO-ROM MO.etc. 


















KEYBOARD 



31 



EP0896284A1 



CVI 
CD 
U. 




32 



EP 0896 284 A1 



FIG. 3 



START ^ 







SCAN OTD TO 1 
EXTRACT TAGS | 






CREATE A TAG 
CODE TABLE 



CODE TAGS IN 
THE SGML 
DOCUMENT 



END ^ 



33 



EP 0 896 284 A1 



U- 




34 



EP0 896 284A1 



FIG. 5 



START ^ 



SCAN DTD TO 
EXTRACT TAGS 






CREATE A TAG 
DECODE TABLE 






DECODE 
TAGS IN 
SGML DC 


CODED 
THE 

)CUMENT 



Qno^ 



Bt 



82 



■B3 



35 



EP0896 284A1 



UJ 

o 



iO 


z 
o 


C5> 


ROLLER 

FROLLING GREAT 
^ CODE TABLE ) 


U. 




:ONTF 

OF ; 





00 



ft Q 
Q = H 

o 



8 



<q: 



o 



UJ 

o 

8 



i 
8 



o 



or 

a. 
< 

z 

CO 

(n 

UJ 

q: 

Q. 

s 

o 
o 



z 

OX 
COUJ 



CM 

/ 



o 
z 

< 



»- Q3 



H OZ 



O 

O 

o 



O 

8? 



36 



EP0e96284A1 



FIG. 7 



START ^ 



.CI 





\ COINCIDE ? y 




NO 




SCAN DTD TO 
EXTRACT TAGS 








CREATE A TAG 
CODE TABLE 










CODE TAGS IN 
THE SGML 

DOCUMENT 







Q END ^ 



•C2 



-C3 



'C4 



37 



EP 0 896 284 A1 



o 
I- 



o— 



00 

li- 




38 



EP0 896 284A1 



FIG. 9 



C start") 





\ COINCIDE? y 




NO 




SCAN OTO TO 1 
EXTRACT TAGS | 








CREATE A TAG 1 
DECODE TABLE | 










DECODE CODED 
TAGS IN THE 

SGML DOCUMENT 







^ END ^ 



39 



EP 0 896 284 A1 



O 

o 

o 



CVJ 



\ 



i 



to 

tiJ 

q: 
o 
o 
< 



<to2: 



o 
o 
o 



CD 

2X1 



CVJ 



9n- 



UJOZ 
(/)OZ> 



8 



< 

X 

q: 

Ui 
X 



si- 

o 



o 

xo 
coo 3 



O 
O- 



6 



§5 



40 



EP0 896 284A1 




41 



EP 0 896 284 A1 



FIG. 12 



SCAN DTD TO SUCCESSIVELY 
STORE TAGS IN THE MEMORY 



'El 



IS A CHARACTER TO 
BE COOED A TAG ? 



E2 

6 



NO 



YES 



OUTPUT A SPECIAL CODE 
COC 



E5 



CODE A CHARACTER OR 
A CHARACTER STRING 



OUTPUT ADDRESS AND 
LENGTH OF A TAG IN THE 
MEMORY 



'E4 



NO 



IS THE CODING 

COMPLETED ? 

I YES 



Q END ^ 




E6 



42 



EP0 896284A1 



FIG. 13 



SGML DOCUMENT 



^Bli<B>eS«</B>ft, 



TAG CODE TABLE 



101 



TAG CODE 




<B> :0 

</B>:1 


» 



'F1 



SGML DOCUMENT IN WHICH 
TAGS ARE CODED 



COC IS A SPECIAL CODE 
SIGNIFYING A SWITCHING 
OF THE PROCESS 



"0" IS A CODE 
NOTATION 



IN BINARY 



CODE 



F3 



^tiC0C"0"B^;KC0C"l"Ttj-^F2 



104a 



CODE TABLE 



CHARACTER 


CODE 


COC 


! 10 


o 


= 010 


t 


!0110 




01111 




1 100 


T ' 


• 1 101 




1 1110 


« • 


' i 1 110 




Mini 



0xff9e7b2eb2 
(11111/11110/0111/10/0 /11110/1100/10/1/1101/0110/010) 

^ 



0%%t\S A CODE IN HEXADECIMAL 
NOTATION 

• IN PARENTHESIS IN THE LOWER 
COLUMN IS A CODE IN BINARY 
NOTATION 



43 



I 

EP 0886 284 A1 




ro 



li- 



CD 

i3 



UJ 

o 
o 
< 



CVJ 



CM 



o 

UJ 

2 



O 

O- 

CVJ 



CD 

CDXZ 
C0UJ3 



CD 



O 
O 

<UIZ 
H03 



CD 

SQ 

liJUJZ 
10O3 



ro 
O 
CO 



o 

CM 



CD 

3:02 

C0O3 



lO 

■O 



z 

€0 



O 

a 
o 

o 

o 

ro 
O 
cvi 



6 

UJ 
Q 
O 

a 



44 



EP0 896 284A1 



FIG. 15 
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